[Beowulf] High Performance for Large Database

Felix Rauch Valenti felix.rauch.valenti at gmail.com
Mon Nov 8 20:41:52 PST 2004


On Wed, 27 Oct 2004 09:29:58 +0800, Laurence Liew
<laurenceliew at yahoo.com.sg> wrote:
[...]
> 3. Try running Postgresql on a cluster filesystem like PVFS - it is not
> gauranteed as it probably fails the ACID test for a SQL compliant
> database. The basic idea is that if we cannot parallelise the database -
> we make the underlying IO parallel and hence boost the IO performance of
> the system.. and any applications that run on them.. and this includes
> Postgresql.

I tried this as part of my dissertation (I'm not a database person though).

We basically compared the performance of thee different
configurations: A single-node Oracle, Oracle on top of PVFS, and
Oracle on top of a distributed-devices system.

More specifically, we tried:
- Oracle running on a single node with a single SCSI disk.
- Oracle running on a single node, accessing its data files on a PVFS
with 6 servers interconnected by Gigabit Ethernet.
- Oracle running on a single node, accessing its data files on a
RAID0, who's 3 constituting partitions were accessed by a special
protocol (similar in its idea to network block devices) over Gigabit
Ethernet.

We ran the experiments (TPC-D benchmarks) a few years ago. The results
were in a nutshell: The performance of the above PVFS configuration
was very low, most likely because the database's 4-KByte reads were to
small. While the configuration with distributed devices was much
better, it was not significantly faster then the single-node
configuration.

To compare, we also tried the TP-Lite query-distribution middleware
(which distributes the queries to 3 Oracle servers over Gigabit
Ethernet), and the performance was best for most cases.

If you are interested in more details (please forgive me the
advertisement), you might want to have a look at chapter 8 of my
thesis [1] or an upcoming paper titled "OS Support for a Commodity
Database on PC Clusters -- Distributed Devices vs. Distributed File
Systems" to be published at the 16th Australasian Database Conference
(the final version is unfortunately not yet ready).

- Felix



More information about the Beowulf mailing list