[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes

Wed Sep 2 20:29:07 PDT 2009

On Wed, Sep 2, 2009 at 5:41 PM, Mark Hahn<hahn at mcmaster.ca> wrote:
>> allows global cross mounts from ~300 compute nodes) There is a variety
>> of codes we run; some latency sensitive and others bandwidth
>> sensitive.
>
> if you're sensitive either way, you're going to be unhappy with Gb.

I am still testing sensitivity but I suspect I am sensitive either way.

> IMO, you'd be best to configure your scheduler to never spread an MPI
> job across switches,

Good idea. I was thinking about it. Might need to tweak my PBS scheduler.

>and then just match the backbone to the aggregate IO
> bandwidth your NFS storage can support.

That brings me to another important question. Any hints on speccing
the head-node? Especially the kind of storage I put in on the head
node. I need around 1 Terabyte of storage. In the past I've uses
RAID5+SAS in the server. Mostly for running jobs that access their I/O
via files stored centrally.

For muscle I was thinking of a Nehalem E5520 with 16 GB RAM. Should I
boost the RAM up? Or any other comments. It is tricky to spec the
central node.

Or is it more advisable to go for storage-box external to the server
for NFS-stores and then figure out a fast way of connecting it to the
server. Fiber perhaps?

-- 
Rahul