[Beowulf] PostgreSQL

Laurence Liew laurence at scalablesystems.com
Thu Aug 19 19:42:58 PDT 2004


Hi,

Postgresql is not cluster aware.. so by installing on a beowulf cluster 
does not gives u any advantage. There are several alternatives you may 
wish to explore:

1) run a cluster fs (GFS is now open source), and make postgresql write 
to it... while it will not parallelize your query... it will make the 
read/write faster assuming your IOs are large... however.. I am not sure 
how the ACID test will stand up to this... never tested, never tried...

2) look at Oracle 10g or Ingres. Ingres seems to have some clustering 
capabilities similar to 10g... and since it is open source now... Ingres 
could be an option to explore if you need the clustering capabilities...

3) pay Postgresql to create a parallel version... I know it is on their 
TODO list... but way down unless the development could be funded.

Hope this helps!

Cheers!

Laurence
Scalable Systems
Singapore

Michael Will wrote:
> Postgres and other databases inherently are designed according to the SMP-model 
> and not message passing. This means that even though they might be 
> heavily multithreaded they still need shared memory to coordinate things efficiently.
> 
> 
> There are several approaches to running it on a beowulf style distributed cluster though:
> 
> 1. run database on one node, and apps on other nodes
> I successfully installed tomcat+jsp+postgresql to serve webapplications on a 
> Scyld Beowulf cluster, but that just means that some nodes run the applicationserver 
> container (tomcat) and one node runs the database that the webapps can connect to. 
> 
> This works especially well if you have a node with Quad-Opterons (SMP) designated as the
> database server.
> 
> 
> 2. split databases between independend postgresql nodes
> Some people slit the data that they process, maybe by hashing on the primary key,
> and run separate postgres databases on their nodes, managing their own set of data.
> 
> I know somebody that has 30 nodes running his real-time data analysis with postgresql
> databases, splitting up the data into 30 subsets by the primary key.
> 
> 
> 3. implement shared memory on a cluster (yuck) and pretend to be SMP
> This requires a high speed low latency interconnect in order to run performant,
> and definitely is no longer pure commodity hardware.
> 
> 4. there exist solutions of some database middleware that distributes the data
> out to several nodes. That still only gives you single node updates but allows for
> multi-node reads and failover scenarios.
> 
> 5. Mysql.com has an alternative which is an in-RAM clustered database that was
> designed from grounds up with the message passing model in mind. It should be
> the most efficient way to do this.
> 
> Michael Will
> 
> On Thursday 19 August 2004 08:46 am, Roberto Melo Cavalcante wrote:
> 
>>Hi everybody!
>>
>>I'm newbie in cluster related topics. I really tried to search on your 
>>archives, but they are too many to search then all. My question is not 
>>quite exactly about cluster, but about the availabillity of PostgreSQL 
>>on it. Sorry for that. I did not ask this question on a PostgreSQL 
>>mailling list but so far I've searched for many days on its site without 
>>a clear confirmation it doesn't. So please, be patient.
>>
>>I'd like to know if does anyone have sucessfully installed PostgreSQL on 
>>a Beowulf cluster?
>>If so, does PostgreSQL is really taking advantage of the cluster? I mean 
>>does it see the entire cluster as one machine?
>>
>>Thanks.
>>
>>Roberto Melo Cavalcante
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
> 
> 




More information about the Beowulf mailing list