Any coments on GFS as applies to a Beowulf ?

Matthew O'Keefe okeefe at
Sat Feb 17 10:15:04 PST 2001


we wrote GFS originally for a problem I had in dealiing 
with the large-scale, parallel fluid dynamics and electromagnetics
calculations I was working on.  The problem was very simple:
getting the simulation data of the parallel machine over to
the graphics machines so we could visualize and make sense of 
it all.  Doing this across Ethernet and TCP/IP was not feasible,
and since new shared storage networking technologies were appearing,
having the graphics machine and supercomputer share the data
gave us speed and efficiency (no data replication!).

There is a white paper on how GFS lets you develop Storage Clusters,
and how GFS applies to supercomputing which you can find at

and check out the paper "Storage Clusters for Linux".

I think Beowulfs work very well for parallel applications 
(like Monte Carlo) that are not IO-intensive, of which 
there are many.  However, IO is
currently a weakness, in particular when one tries to migrate
data off the Beowulf.  GFS allows Beowulf nodes to share and
pool disks to significantly improve IO scalability by increasing:

  * extendibility:  you can add more nodes and more disks to your
    Beowulf to increase its storage and computational capacity

  * availability:  be decoupling storage devices from compute nodes, you
    don't lose access to storage when a compute node dies; in addition,
    you can add more compute nodes and storage while the Beowulf is
    running, and *on-line* resize both the volume manager and GFS, again,
    while the Beowulf is running. 

  * manageability:  GFS allows you to create a single pool
    of storage that is more efficient than server-centric storage,
    and that can be much more easily managed.  In addition, in 
    combination with server virtualization technologies like
    bproc from Scyld, load balancing across Beowulf nodes becomes
    much more efficient.

  * affordability:  GFS runs on Linux and PCs and is media-independent:  
    you can use Fibre Channel,
    Myrinet, or whatever as your shared media for storage.  You can
    use different kinds of storage devices and networking equipment from
    a variety of vendors to build low-cost GFS clusters. 

  * efficiency:  GFS allows you to efficiently load-balance applications,
    consolidate storage, and quickly transfer data from your Beowulf

GFS is a 64-bit, production-ready, journaled cluster file system, 
that allows fast recovery
from node failures, and that supports large files, directories, and
file systems.  It is GPL'ed code, available on Linux 2.2 currently, and will
soon be available on 2.4 ( < 2 weeks). Within 6 months, it will be
integrated with a new cluster version of the Linux Logical Volume 
Manager to provide integrated file and volume cluster services.

GFS will change the way you compute and run your servers.  It allows
Linux to leapfrog nearly every other UNIX by providing a cluster
file system that scales and makes Beowulf's and Linux HA clusters
*manageable*.  It is being used at leading NASA labs, web sites, and 
feature film shops.  In 2001, it will be ported to FreeBSD as well.


go get it! 

Matt O'Keefe

On Thu, Feb 08, 2001 at 08:41:26AM -0800, JParker at wrote:
> G'Day !
> cheers,
> Jim Parker
> Sailboat racing is not a matter of life and death ....  It is far more 
> important than that !!!

More information about the Beowulf mailing list