[Beowulf] High Performance for Large Database

hanzl at noel.feld.cvut.cz hanzl at noel.feld.cvut.cz
Wed Oct 27 02:42:15 PDT 2004


> > I'm currently working on a project that will require fast access to
> > data stored in a postgreSQL database server.  I've been told that a
> > ...
> > and 64+ GB of RAM, the prices sky rocket.  If I can acheive high
> > performance with a cluster, using 15-20 dual processor machines, that
> > would be great.
> 
> This sort of cluster isn't a "beowulf" cluster; rather it is a variant
> of a high availability cluster.  It's Extreme Linux, just not beowulf.
> The beowulf design (and focus of this list) is "high performance
> computing" clusters, aka supercomputing clusters.

I think that while this is true in many particular cases, it is far
from being true in general. There are applications which involve
databases and could be as beowulfish as it can get.

I know reseachers who work with extremely huge and complex graphs and
use a database for this. Should they have say a MPI-based database
with all data in RAM they could get tremendous speedups. They would be
happy to copy the database to the distributed cluster RAM, do few
zillions of operations on it and then copy some results back.

I do agree that a database might not be the best tool for their job
and complete rewrite of all the code they have might help :-)

However I consider programming against a db API to be an important
knowledge reuse and nice split of their problem into two parts which
together take more computer time than one monolith would but one of
them (the db searches) is a problem with commodity solutions.

(And I might even argue that even high availability databases may very
well use The True Beowulf as a component doing searches on mostly
read-only data cached in cluster RAM or even cached in local
harddisks.)

The only difference I can see is the application (which is not a CFD or
galactic evolution or similar). From the point of wiew of
interconnects, OS types, parallel libraries used, RAM, processors,
cluster management etc. I see no reason why databases and beowulf
could not overlap.

Best Regards

Vaclav Hanzl





More information about the Beowulf mailing list