Any coments on GFS as applies to a Beowulf ?
Matthew O'Keefe
okeefe at borg.umn.edu
Sat Feb 17 10:15:04 PST 2001
Hi,
we wrote GFS originally for a problem I had in dealiing
with the large-scale, parallel fluid dynamics and electromagnetics
calculations I was working on. The problem was very simple:
getting the simulation data of the parallel machine over to
the graphics machines so we could visualize and make sense of
it all. Doing this across Ethernet and TCP/IP was not feasible,
and since new shared storage networking technologies were appearing,
having the graphics machine and supercomputer share the data
gave us speed and efficiency (no data replication!).
There is a white paper on how GFS lets you develop Storage Clusters,
and how GFS applies to supercomputing which you can find at
http://www.sistina.com/Pages/publications.html
and check out the paper "Storage Clusters for Linux".
I think Beowulfs work very well for parallel applications
(like Monte Carlo) that are not IO-intensive, of which
there are many. However, IO is
currently a weakness, in particular when one tries to migrate
data off the Beowulf. GFS allows Beowulf nodes to share and
pool disks to significantly improve IO scalability by increasing:
* extendibility: you can add more nodes and more disks to your
Beowulf to increase its storage and computational capacity
* availability: be decoupling storage devices from compute nodes, you
don't lose access to storage when a compute node dies; in addition,
you can add more compute nodes and storage while the Beowulf is
running, and *on-line* resize both the volume manager and GFS, again,
while the Beowulf is running.
* manageability: GFS allows you to create a single pool
of storage that is more efficient than server-centric storage,
and that can be much more easily managed. In addition, in
combination with server virtualization technologies like
bproc from Scyld, load balancing across Beowulf nodes becomes
much more efficient.
* affordability: GFS runs on Linux and PCs and is media-independent:
you can use Fibre Channel,
Myrinet, or whatever as your shared media for storage. You can
use different kinds of storage devices and networking equipment from
a variety of vendors to build low-cost GFS clusters.
* efficiency: GFS allows you to efficiently load-balance applications,
consolidate storage, and quickly transfer data from your Beowulf
cluster.
GFS is a 64-bit, production-ready, journaled cluster file system,
that allows fast recovery
from node failures, and that supports large files, directories, and
file systems. It is GPL'ed code, available on Linux 2.2 currently, and will
soon be available on 2.4 ( < 2 weeks). Within 6 months, it will be
integrated with a new cluster version of the Linux Logical Volume
Manager to provide integrated file and volume cluster services.
GFS will change the way you compute and run your servers. It allows
Linux to leapfrog nearly every other UNIX by providing a cluster
file system that scales and makes Beowulf's and Linux HA clusters
*manageable*. It is being used at leading NASA labs, web sites, and
feature film shops. In 2001, it will be ported to FreeBSD as well.
So....
go get it! www.sistina.com/gfs/
Matt O'Keefe
On Thu, Feb 08, 2001 at 08:41:26AM -0800, JParker at coinstar.com wrote:
> G'Day !
>
> http://news.linuxprogramming.com/news_story.php3?ltsn=2001-02-08-002-05-CD
>
> cheers,
> Jim Parker
>
> Sailboat racing is not a matter of life and death .... It is far more
> important than that !!!
More information about the Beowulf
mailing list