[Beowulf] High Performance for Large Database
jakob at unthought.net
Thu Oct 28 03:03:56 PDT 2004
On Tue, Oct 26, 2004 at 01:08:00PM -0600, Joshua Marsh wrote:
> Hi all,
> I'm currently working on a project that will require fast access to
> data stored in a postgreSQL database server. I've been told that a
> Beowulf cluster may help increase performance. Since I'm not very
> familar with Beowulf clusters, I was hoping that you might have some
> advice or information on whether a cluster would increase performance
> for a PostgreSQL database. The major tables accessed are around
> 150-200 million records. On a stand alone server, it can take several
> minutes to perform a simple select query.
> It seems like once we start pricing for servers with 16+ processors
> and 64+ GB of RAM, the prices sky rocket. If I can acheive high
> performance with a cluster, using 15-20 dual processor machines, that
> would be great.
It depends. I was involved in one project where we had some hosts doing
a *massive* number of queries against postgres, but no or few updates.
This parallelizes very well. A single quiery would not run faster, but
when you run thousands of queries, running them against a cluster of
postgresql databases will even out the load just nicely, giving you
linear scaling (sustained queries per second versus machines in the
I don't think you'll have any luck finding off-the-shelf
production-quality database software that will parallelize a single
query on a number of nodes.
If you just want throughput, large numbers of queries on a large number
of databases, and you are doing mostly selects with very few (if any)
updates/inserts/deletes, then PostgreSQL comes with software that can
help you mirror your database.
What you do is, you have a 'master' database - you will perform all
updates/deletes/inserts against this master.
The master will relay updates to a number of slave databases.
You perform all selects against the slaves.
Simply, stable, and works perfectly within the limits inherent in such a
setup (eg. a single query won't parallelize, the master cannot scale to
more updates than what is possible on a single system, etc.)
More information about the Beowulf