[Beowulf] Storage

Wed Oct 6 07:42:46 PDT 2004

Dear List,

I'm turning to you for some top quality advice as I have so often in the
past.

I'm helping assemble a grant proposal that involves a grid-style cluster
with very large scale storage requirements.  Specifically, it needs to
be able to scale into the 100's of TB in "central disk store" (whatever
that means:-) in addition to commensurate amounts of tape backup.  The
tape backup is relatively straightforward -- there is a 100 TB library
available to the project already that will hold 200 TB after an
LTO1->LTO2 upgrade, and while tapes aren't exactly cheap, they are
vastly cheaper than disk in these quantities.

The disk is a real problem.  Raw disk these days is less than $1/GB for
SATA in 200-300 GB sizes, a bit more for 400 GB sizes, so a TB of disk
per se costs in the ballpark of $1000.  However, HOUSING the disk in
reliable (dual power, hot swap) enclosures is not cheap, adding RAID is
not cheap, and building a scalable arrangement of servers to provide
access with some controllable degree of latency and bandwidth for access
is also not cheap.  Management requirements include 3 year onsite
service for the primary server array -- same day for critical
components, next day at the latest for e.g. disks or power supplies that
we can shelve and deal with ourselves in the short run.  The solution we
adopt will also need to be scalable as far as administration is
concerned -- we are not interested in "DIY" solutions where we just buy
an enclosure and hang it on an over the counter server and run MD raid,
not because this isn't reliable and workable for a departmental or even
a cluster RAID in the 1-8 TB range (a couple of servers) it isn't at all
clear how it will scale to the 10-80 TB range, when 10's of servers
would be required.

Management of the actual spaces thus provided is not trivial -- there
are certain TB-scale limits in linux to cope with (likely to soon be
resolved if they aren't already in the latest kernels, but there in many
of the working versions of linux still in use) and with an array of
partitions and servers to deal with, just being able to index, store and
retrieve files generated by the compute component of the grid will be a
major issue.

SO, what I want to know is:

  a) What are listvolken who have 10+ TB requirements doing to satisfy
them?

  b) What did their solution(s) cost, both to set up as a base system
(in the case of e.g. a network appliance) and

  c) incremental costs (e.g. filled racks)?

  d) How does their solution scale, both costwise (partly answered in b
and c) and in terms of management and performance?

  e) What software tools are required to make their solution work, and
are they open source or proprietary?

  f) Along the same lines, to what extent is the hardware base of their
solution commodity (defined here as having a choice of multiple vendors
for a component at a point of standardized attachment such as a fiber
channel port or SCSI port) or proprietary (defined as if you buy this
solution THIS part will always need to be purchased from the original
vendor at a price "above market" as the solution is scaled up).

Rules:  Vendors reply directly to me only, not the list.  I'm in the
market for this, most of the list is not.  Note also that I've already
gotten a decent picture of at least two or three solutions offered by
tier 1 cluster vendors or dedicated network storage vendors although I'm
happy to get more.

However, I think that beowulf administrators, engineers, and users
should likely answer on list as the real-world experiences are likely to
be of interest to lots of people and therefore would be of value in the
archives.  I'm hoping that some of you bioinformatics people have
experience here, as well as maybe even people like movie makers.

FWIW, the actual application is likely to be Monte Carlo used to
generate huge data sets (per node) and cook them down to smaller (but
still multiGB) data sets, and hand them back to the central disk store
for aggregation and indexed/retrievable intermediate term storage, with
migration to the tape store on some as yet undetermined criterion for
frequency of access and so forth.  Other uses will likely emerge, but
this is what we know for now.  I'd guess that bioinformatics and movie
generation (especially the latter) are VERY similar in the actual data
flow component and also require multiTB central stores and am hoping
that you have useful information to share.

Thanks in advance,

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu