[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
Joshua Baker-LePain
jlb17 at duke.edu
Wed Sep 2 20:54:17 PDT 2009
On Wed, 2 Sep 2009 at 10:29pm, Rahul Nabar wrote
> That brings me to another important question. Any hints on speccing
> the head-node? Especially the kind of storage I put in on the head
> node. I need around 1 Terabyte of storage. In the past I've uses
> RAID5+SAS in the server. Mostly for running jobs that access their I/O
> via files stored centrally.
>
> For muscle I was thinking of a Nehalem E5520 with 16 GB RAM. Should I
> boost the RAM up? Or any other comments. It is tricky to spec the
> central node.
>
> Or is it more advisable to go for storage-box external to the server
> for NFS-stores and then figure out a fast way of connecting it to the
> server. Fiber perhaps?
Speccing storage for a 300 node cluster is a non-trivial task and is
heavily dependent on your expected access patterns. Unless you anticipate
vanishingly little concurrent access, you'll be very hard pressed to
service a cluster that large with a basic Linux NFS server. About a year
ago I had ~300 nodes pointed at a NetApp FAS3020 with 84 spindles of 10K
RPM FC-AL disks. A single user could *easily* flatten the NetApp (read:
100% CPU and multi-second/minute latencies for everybody else) without
even using the whole cluster.
Whatever you end up with for storage, you'll need to be vigilant regarding
user education. Jobs should store as much in-process data as they can on
the nodes (assuming you're not running diskless nodes) and large jobs
should stagger their access to the central storage as best they can.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
More information about the Beowulf
mailing list