[Beowulf] scaling / file serving

Thu Jun 10 14:55:44 PDT 2004

On Wed, 2004-06-09 at 20:02, Patrice Seyed wrote:
> Hi,
> 
> A current cluster with 15 nodes/ 30 processors mainly used for batch
> computing has one head/management node that maintains scheduling services as
> well as home directories for users. The cluster is due for an upgrade that
> will increase the number of compute nodes to about 100. I'm considering
> breaking out one of the compute nodes, adding disks, and making it a
> storage/file server node.

Hmm...  If these compute nodes rarely access the disk, this might be
OK.  If they are going to pound on the disk, and really exercise the
network pipe, that is going to be your bottleneck.

Remember, the network pipe into the NFS server is a shared resource
which is contended for.  This means it is going to be a bottleneck under
load.  You could see the nice "1/N" performance issue where you get N
requesters for a fixed sized resource, so that on average each requester
gets 1/Nth of the fixed sized resource (network pipe, file system
bandwidth).

Building a disk system is easy, building one that scales well is hard.  

The other issue is local disk.  There are some folks absolutely
horrified at the prospect of a cluster node having a local disk.  Makes
management harder.  Then again, for each IDE channel and reasonably
modern disk, you can get 40-50 MB/s of read and about 33 MB/s write
performance.  So if you have a nice RAID0 stripe across 2 different IDE
channels (remember that cluster vendors, *different*) IDEs, you can pull
80+ MB/s reads and 60+ MB/s writes per node (one recent IBM 325 Opteron
based system I put together hit 120MB/s sustained reads on a large
Abaqus job, and about 90 MB/s sustained writes).  So if you can set your
scratch to run off of the local RAID0, you can get some serious
performance versus a network based file system.  Of course some folks
would prefer to spend and extra $500 per node on 15k RPM SCSI to get ...
60 MB/s on writes and 80 MB/s on reads.

Some folks absolutely insist upon running over the NFS or similar
mounted systems.  In that case, you need a proper system design for
this.

> At what point in scaling up would you folks recommend adding a node
> dedicated to file serving of home dirs, etc.? I have a separate cluster with
> 134 nodes purchased with the storage node in mind.

My usual rule of thumb (read as "kneejerk response before hearing any
more info") is one file server gigabit NIC per 32 CPUs (16 dual CPU
nodes).  Since you are talking about more than that, you should look at
other designs.

This is low/moderate load on file system across nodes.  You can make a
file server very unhappy by having all your compute nodes beat on it at
once.  Lots of folks who build clusters like making these same design
bottlenecks for whatever reason, and put one large file system in the
middle of the cluster, hosted off a single machine, with a single
network connection.  Think NAS when you look at those.

Joe

> 
> Regards,
>  
> Patrice Seyed
> Linux System Administrator - SIG
> RHCE, SCSA
> Boston University School of Medicine

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615