[Beowulf] HPC and SAN
landman at scalableinformatics.com
Sat Dec 18 17:39:46 PST 2004
Michael Will wrote:
>Veritas has something called VxFS that could be used for that, and there also special cluster-filesystems
Hmmm... Last I heard VxFS was limited to 4 or 8 hosts. Not very HPC like...
>like gfs and lustre that are supposed to solve that problem. In that case, you can also have just some
>compute nodes act as storage nodes, and so you don't need fibre channel cards in all of them. The
>storage nodes then act similar to redundant nfs servers.
I remain skeptical on the value proposition for a SAN in a cluster.
In short, you need to avoid single points of information flow within
clusters. The absolute best aggregate bandwidth you are going to get
will be local storage. At 50+ MB/s, a SATA drive in a compute node
multipled by N compute nodes rapidly outdistances all (save one)
hardware storage design that I am aware of. And it does it at a tiny
fraction of the cost. Unfortunately you have N namespaces for your
files (think of the file URI as file://node/path/to/filename.ext, and
the value of "node" varies). Most code designs assume a single shared
storage, or common namespace for the files. This is where the file
systems folks earn their money (well one does anyway IMO).
>Another interesting case is PVFS (and hopefully soon PVFS2) that accumulates local storage of the
>nodes into a parallel virtual filesystem allowing distributed storage and access. In case of PVFS
Having used PVFS (or at least tried to use PVFS) for a project, I
discovered rather quickly some of the missing functionality (soft links,
etc), resulted in large chunks of wrapper code not working (and no, it
made no sense to change the wrapper code to suit this file system), and
at least 2 MPI codes that I played with did not like it. I don't want
to knock all the hard work that went into it, but I am not sure I would
try PVFS2 without a very convincing argument that it implements full
unix file system (POSIX) interfaces, and things work transparently.
>the data is not distributed redundandly, which means that one node going down means part of
>your filesystem data disappears - so unless you have rock solid nodes connected to a UPS, this
>might be good only for a large fast /tmp.
There are alternatives to this that work today.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 612 4615
More information about the Beowulf