[Beowulf] High performance storage with GbE?
bill at cse.ucdavis.edu
Wed Dec 13 11:39:10 PST 2006
Steve Cousins wrote:
> We are currently looking to upgrade storage on a 256 node cluster for
> 1.2 to 2.5 GB/sec when it gets to the storage. Of course, this is only
What do you expect the I/O's to look like? Large file read/writes? Zillions
of small reads/writes? To one file or directory or maybe to a file or
directory per compute node?
My approach so far as been to buy N dual opterons with 16 disks in each
(using the areca or 3ware controllers) and use NFS. Higher end 48 port
switches come with 2-4 10G uplinks. Numerous disk setups these days
can sustain 800MB/sec (Dell MD-1000 external array, Areca 1261ML, and the
3ware 9650SE) all of which can be had in a 15/16 disk configuration for
$8-$14k depending on the size of your 16 disks (400-500GB towards the lower
end, 750GB towards the higher end).
NFS would be easy, but any collection of clients (including all) would be
performance limited by a single server.
PVFS2 or Lustre would allow you to use N of the above file servers and
get not too much less than N times the bandwidth (assuming large sequential
reads and writes).
In particular the Dell MD-1000 is interesting in that it allows for 2 12Gbit
connections (via SAS), the docs I've found show you can access all 15
disks via a single connection or 7 disks on one, and 8 disks on the other.
I've yet to find out if you can access all 15 disks via both interfaces
to allow fallover in case one of your fileservers dies. As previously
mentioned both PVFS2 and Lustre can be configured to handle this situation.
So you could buy a pair of dual opterons + SAS card (with 2 external
conenctions) then connect each port to each array (both servers to
both connections), then if a single server fails the other can take
over the other servers disks.
A recent quote showed that for a config like this (2 servers 2 arrays) would
cost around $24k. Assuming one spare disk per chassis, and a 12+2 RAID6 array
and provide 12TB usable (not including 5% for filesystem overhead).
So 9 of the above = $216k and 108TB usable, each of the arrays Dell claims
can manage 800MB/sec, things don't scale perfectly but I wouldn't be surprised
to see 3-4GB/sec using PVFS2 or Lustre. Actual data points appreciated, we
are interested in a 1.5-2.0GB/sec setup.
Are any of the solutions you are considering cheaper than this? Any of the
dual opterons in a 16 disk chassis could manage the same bandwidth (both 3ware
and areca claim 800MB/sec or so), but could not survive a file server death.
More information about the Beowulf