[Beowulf] High Performance for Large Database
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jakob Oestergaard jakob at unthought.netThu Oct 28 03:03:56 PDT 2004
- Previous message: [Beowulf] Re: High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Oct 26, 2004 at 01:08:00PM -0600, Joshua Marsh wrote: > Hi all, > > I'm currently working on a project that will require fast access to > data stored in a postgreSQL database server. I've been told that a > Beowulf cluster may help increase performance. Since I'm not very > familar with Beowulf clusters, I was hoping that you might have some > advice or information on whether a cluster would increase performance > for a PostgreSQL database. The major tables accessed are around > 150-200 million records. On a stand alone server, it can take several > minutes to perform a simple select query. > > It seems like once we start pricing for servers with 16+ processors > and 64+ GB of RAM, the prices sky rocket. If I can acheive high > performance with a cluster, using 15-20 dual processor machines, that > would be great. It depends. I was involved in one project where we had some hosts doing a *massive* number of queries against postgres, but no or few updates. This parallelizes very well. A single quiery would not run faster, but when you run thousands of queries, running them against a cluster of postgresql databases will even out the load just nicely, giving you linear scaling (sustained queries per second versus machines in the cluster). I don't think you'll have any luck finding off-the-shelf production-quality database software that will parallelize a single query on a number of nodes. If you just want throughput, large numbers of queries on a large number of databases, and you are doing mostly selects with very few (if any) updates/inserts/deletes, then PostgreSQL comes with software that can help you mirror your database. What you do is, you have a 'master' database - you will perform all updates/deletes/inserts against this master. The master will relay updates to a number of slave databases. You perform all selects against the slaves. Simply, stable, and works perfectly within the limits inherent in such a setup (eg. a single query won't parallelize, the master cannot scale to more updates than what is possible on a single system, etc.) -- / jakob
- Previous message: [Beowulf] Re: High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
