[Beowulf] High Performance for Large Database
laurence at scalablesystems.com
Tue Nov 16 01:11:36 PST 2004
Yes... you are right. Databases IO are usually in short burst and in
smaller packets compared to HPC type IO.
However... Oracle 10g does presents an interesting architecture... and
could prove a valuable door-opener to many of us on this mailing list to
work on HPC style type systems in enterprises.
10g currently requires a SAN backend... and I believe can scale to a
"large" number of nodes - of course it will be much smaller then what
the list is used to... but for enterprises... running 10g on commodity
hardware using linux clusters (not HPC in this case) is nevertheless an
important milestone for the beowulf community.
Kumaran Rajaram wrote:
> Imho, the I/O workload in databases are dominated by random, small
> block-sized requests. In order to cater such I/O pattern, nodes
> hosting databases tend to have large caches (RAM) and are SMP-based.
> The database software implement proprietary storage/access policies for
> high performance. In this sense, databases mostly require block-device
> interface from the storage system than file-system interface. File-system
> interface should also work although performance-wise, you tend to add
> additional layer and are restricted by file system storage policies. The
> pros is that file-system aggregates the storage and provides a single
> namespace, making it easier to manage + backup the data.
> In terms of block-devices, SAN provides low latency, high bandwidth,
> and high availability ideal for database environment. For moderates
> prices, iSCSI SAN may be used instead of FC SAN. SAN also makes management
> of block-devices easier. The only caveat is that the maximum size of the
> block-device is 2TB in 2.4 kernel. 2.6 kernel extends this to 16TB.
> PVFS/Lustre are currently tuned for HPC style applications which are
> dominated by large, contiguous I/O requests and the file-system striping
> policies helps to provide higher bandwidth. However, for small-sized
> requests, striping may not prove beneficial. Also, most file-systems use
> TCP/IP, hence the network layer latency can affect database performance.
> MPI-IO interface may be used to optimize non-contiguous, smaller requests
> through its datatype and file-view features. Newer PVFS/Lustre versions
> offer native implementation for low-latency interconnects like Myrinet,
> IB, or Quadrics, however, the stability of the file-system needs to be
> Consistency, intergrity, and availability of data cannot be compromised
> in databases. Current PVFS/Lustre versions stripes files across their I/O
> nodes in RAID-0 pattern. Going down another level, hardware or
> software RAID 1/5 can be performed at the disk level, resulting in file
> system providing RAID 10/50. However, the failure of a single I/O nodes
> might lead to temporary loss (data in cache)/unavailability of file-data
> until the node is revived. RAID1/5 across I/O nodes is planned in future
> Price, Performance, Availability, Manageability, and Consistency of
> file-data need to weighed when architecting the Database solution.
> On Mon, 15 Nov 2004, Laurence Liew wrote:
>>The current version of GFS have a 64 node limit.. something to do with
>>maximum number of connections thru a SAN switch.
>>I believe the limit could be removed in RHEL v4.
>>BTW, GFS was built for enterprise and not specifically for HPC... the
>>use of SAN (all nodes need to be connected to a single SAN storage)..
>>may be a bottleneck...
>>I would still prefer the model of PVFS1/2 and Lustre where the data is
>>distributed amongst the compute nodes
>>I suspect GFS could prove useful however for enterprise clusters say 32
>>- 128 nodes where the number of IO nodes (GFS nodes with exported NFS)
>>can be small (less than 8 nodes)... it could work well
>>Chris Samuel wrote:
>>>On Wed, 10 Nov 2004 12:08 pm, Laurence Liew wrote:
>>>>You may wish to try GFS (open sourced by Red Hat after buying
>>>>Sistina)... it may give better performance.
>>>Anyone here using the GPL'd version of GFS on large clusters ?
>>>Be really interested to hear how folks find that..
>>>Beowulf mailing list, Beowulf at beowulf.org
>>>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>Laurence Liew, CTO Email: laurence at scalablesystems.com
>>Scalable Systems Pte Ltd Web : http://www.scalablesystems.com
>>(Reg. No: 200310328D)
>>7 Bedok South Road Tel : 65 6827 3953
>>Singapore 469272 Fax : 65 6827 3922
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Laurence Liew, CTO Email: laurence at scalablesystems.com
Scalable Systems Pte Ltd Web : http://www.scalablesystems.com
(Reg. No: 200310328D)
7 Bedok South Road Tel : 65 6827 3953
Singapore 469272 Fax : 65 6827 3922
More information about the Beowulf