the need for storage area networks [was: Shared diskspace between nodes]
okeefe at brule.borg.umn.edu
Sun Jan 13 11:08:45 PST 2002
others have provided good suggestions for solutions to
your problem, but I think the ultimate solution to your
problem is a storage area network between your Beowulf nodes
and a pool of shared storage devices. This approach allows efficient
partitioning and sharing of storage between the Beowulf nodes.
A cluster file system like GFS can be used to map a shared
file system (one that all nodes can mount directly) onto the
shared storage devices. This approach completely removes
your problem: trying to map your data evenly across many nodes,
when the data needs on each node can grow or shrink in
unexpected ways. It also allows you to manage 1 file system,
instead of 40.
Some may object that SANs are expensive, but that is changing.
IP-based SANs are now becoming available, and a cluster of
NFS servers with shared storage and a cluster file system can
also be used to share data across a Beowulf without the full
expense of a SAN. For details see the white paper I wrote
on "Accelerating Technical Computing..." at the Sistina web site
When running complex parallel applications in production
on a Beowulf cluster (for example, Oracle Real Application Clusters),
a storage area network and cluster file system greatly
simplifies your life.
On Mon, Jan 07, 2002 at 02:36:42PM -0500, Jon E. Mitchiner wrote:
> I presently run a 40-node cluster, Dual 1GHz with 20GB hard drive on each
> system. This gives me roughly 15GB (safe estimate) after the OS, installed
> programs, some data, etc on each machine. This gives me roughly 600GB of
> space that I am not currently utilizing on 40 nodes.
> Right now, we are saving data on various nodes, and moving it around when
> space gets tight on a machine. This is getting time consuming as some of us
> have to look on different nodes to find out where your data is currently
> residing. I am considering saving all directory names in a database and
> then making a GUI interface via the web so its easy to find the location of
> data directories, rather than looking for it (especially if someone moved my
> directory to another machine without letting me know).
> I am curious if there is a program out there that might be able to utilize
> the space that we are not utilizing -- such as linking the file space
> between nodes so that way I can set up a "large" data partition sharable by
> all nodes. Some redunancy would be nice. Im curious if there is a software
> solution (either GPL licensed, or commercial) to utilize the space better.
> Optimally, it would be nice to see all "shared" drives as one large
> partition to be mounted to all nodes and all the data is handled by a daemon
> or something like that.
> Does anyone have any ideas, suggestions, or programs that might be able to
> do something similar?
> Jon E. Mitchiner
> Minotaur Technologies
> AOL IM [http://www.aol.com/aim] MinotaurT
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf