[Beowulf] Re: High Performance for Large Database
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Andrew Piskorski atp at piskorski.comWed Oct 27 10:48:19 PDT 2004
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Oct 26, 2004 at 01:08:00PM -0600, Joshua Marsh wrote: > Hi all, > > I'm currently working on a project that will require fast access to > data stored in a postgreSQL database server. I've been told that a > Beowulf cluster may help increase performance. Unlikely in general, although possible in certain cases. For example, look into Clusgres, memcached, and Backplane. I've previously given links and discussion here: http://openacs.org/forums/message-view?message_id=128060 http://openacs.org/forums/message-view?message_id=179348 > Since I'm not very familar with Beowulf clusters, I was hoping that It is more important that you are extensively familiar with RDBMSs in general, and PostgreSQL in particular. Are you? > you might have some advice or information on whether a cluster would > increase performance for a PostgreSQL database. The major tables > accessed are around 150-200 million records. On a stand alone > server, it can take several minutes to perform a simple select > query. 200 million rows is not that big. What's the approximate total size of your database on disk? Your "several minutes for a simple select" query performance is abysmal, and this is unlikely to be because of your hardware. Most likely, your queries just suck, and you need to do some serious SQL tuning work before even considering big huge fancy hardware. Once you have tables with tens or hundreds of millions of rows, doing ANY full table scans of that table at all sucks really badly, so you MUST profile and tune your queries. And eliminating full table scans of large tables is just the first and most obvious step, it is not unusual to still have very sucky queries after that. > It seems like once we start pricing for servers with 16+ processors > and 64+ GB of RAM, the prices sky rocket. If I can acheive high > performance with a cluster, using 15-20 dual processor machines, that > would be great. If you are even thinking about buying an 8-way or larger box, then you are certainly a candidate for several 2-way boxes with an (expensive) SCI interconnect, so see Clusgres in my links above. Or, if want want to spend a lot less money, your access is read-mostly, and you DON'T need full ACID transactional support for your read-only queries, look into using memcached to cache query results in other machines' RAM. However, VERY few people need such large RDBMS boxes. What makes you think you do? What exactly is your application doing, and what sort of load do you need it to sustain? Have you profiled and tuned all your SQL? Tuned your PostgreSQL and Linux kernel settings? Have you read and worked through all the PostgreSQL docs on tuning? (You didn't install PostgreSQL with its DEFAULT settings, did you? Those are intended to just get it up and running on ALL the platforms PostgreSQL supports, not to give good performance.) Investing hundreds of thousands of dollars in fancy server hardware without first doing your basic RDBMS homework makes no sense at all. If your database is dog slow because of poor data modeling or grossly untuned queries, throwing $300k of hardware at the problem may not help much at all. -- Andrew Piskorski <atp at piskorski.com> http://www.piskorski.com/
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
