diskless nodes? (was Re: Xbox clusters?)

Fri Dec 7 09:58:27 PST 2001

On Thu, Dec 06, 2001 at 02:00:34PM -0500, Mark Hahn's all...
> > impossible to keep up with IO demands? We have an extremely modest setup and
> > yet we get 58MB/s (megaBYTES per second) as per bonnie. with our raid 0 setup
> > on our LVD drives. For 30 nodes this rate of writing is fine. If our jobs in
> 
> a single modern ide disk will sustain around 45 MB/s.  it's just 
> impossible to scale a single server (not to mention it's net)
> to handle 100 diskless clients trying to stream at 40 MB/s.

If you need that much disk bandwidth that is. I agreee that would be very hard
to setup, and there'd be no point in finding a way over NFS to do so.  In that
case local disk makes alot of sense.

> > G98 were writing that much scratch we might want to reconsider what kinds of
> > jobs were running. eahc node doesnt write to scratch all the time, in fact we
> > find it scratches 5-10% max of its computation time at 11-12MB/s write speed.
> > 5-10% * 30 nodes = equiv of 1.5 to 3 nodes.  is 58MB/s enough for full speed
> > writing of 1.5 to 3 nodes?  yes.
> 
> so your bottleneck is 100bT, no surprise.  anyone with large jobs
> is definitely going to want more than 10 MB/s.

There is a bottleneck if they write at 12.5Mb/s for any amount of time as
obviously they wnat to write faster. This means I'm wasting cycles.  But what
am I willing to pay to get rid of that bottleneck and recapture 5% of
computation time?  If I spend more than 5% more for the cost of the cluster to
capture 5% cpu back, then my price performance is lower.

> > Distributing your load is what the clustering concept is all about, why
> > not distribute your disk accesses?
> 
> exactly: a disk per node.
> 
> > Does NO ONE use diskless clusters?
> 
> it's quite common, but mainly for reliability reasons
> (moving parts, mtbf, etc).  doing diskless with any nontrivial IO
> requires expensive interconnect, though (myrinet, quadrics, etc)

Agreed. Again, however, for 'trivial' io which I guess our applications are
fitting into, it can still make alot of sense. Remember, with the types of
calculations we're doing, we're seeing under 50-60Mbps (ie 6-7MB/s) traffic
for all 30 nodes on G98 (ie about 2Mbps per node avg). The jobs are capturing
97-98% of CPU time available. Its not worth it to recapture that 2-3% by adding
even cheap $100 drives to each node. (ie $100 >> 3% of the cost of a node). If
$100 is x% of a node, then its worth buying local disk when I get (100-x)% cpu
usage in a diskless setup because im waiting for the slow network instead of
using a local disk.

/kc
-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA