Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] High Performance for Large Database

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jakob Oestergaard jakob at unthought.net
Thu Oct 28 03:03:56 PDT 2004


On Tue, Oct 26, 2004 at 01:08:00PM -0600, Joshua Marsh wrote:
> Hi all,
> 
> I'm currently working on a project that will require fast access to
> data stored in a postgreSQL database server.  I've been told that a
> Beowulf cluster may help increase performance.  Since I'm not very
> familar with Beowulf clusters, I was hoping that you might have some
> advice or information on whether a cluster would increase performance
> for a PostgreSQL database.  The major tables accessed are around
> 150-200 million records.  On a stand alone server, it can take several
> minutes to perform a simple select query.
> 
> It seems like once we start pricing for servers with 16+ processors
> and 64+ GB of RAM, the prices sky rocket.  If I can acheive high
> performance with a cluster, using 15-20 dual processor machines, that
> would be great.

It depends.  I was involved in one project where we had some hosts doing
a *massive* number of queries against postgres, but no or few updates.

This parallelizes very well.  A single quiery would not run faster, but
when you run thousands of queries, running them against a cluster of
postgresql databases will even out the load just nicely, giving you
linear scaling (sustained queries per second versus machines in the
cluster).

I don't think you'll have any luck finding off-the-shelf
production-quality database software that will parallelize a single
query on a number of nodes.

If you just want throughput, large numbers of queries on a large number
of databases, and you are doing mostly selects with very few (if any)
updates/inserts/deletes, then PostgreSQL comes with software that can
help you mirror your database.

What you do is, you have a 'master' database - you will perform all
updates/deletes/inserts against this master.

The master will relay updates to a number of slave databases.

You perform all selects against the slaves.

Simply, stable, and works perfectly within the limits inherent in such a
setup (eg. a single query won't parallelize, the master cannot scale to
more updates than what is possible on a single system, etc.)

-- 

 / jakob




More information about the Beowulf mailing list