diskless nodes? (was Re: Xbox clusters?)

Thu Dec 6 06:52:52 PST 2001

On Thu, Dec 06, 2001 at 09:22:32AM +0200, Eray Ozkural (exa)'s all...
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Thursday 06 December 2001 06:10, Velocet wrote:
> >
> > Wondering why everyone gets local drives here - do you people have
> > computational software that needs to write scratch or swap faster than
> > 100Mbps (12.5MB/s) or even GBE (125MB/s theoretical, 35-50MB/s actual)?
> > Isnt that fast enough?
> >
> 
> Say you've got 16 nodes.
> 
> Can your disk server attain 16*12.5 MB/sec throughput reliably?

Why do you need that kind of speed? For the stuff we're doing we're not
accessing disk that much. If you are, then either the software is written
quite well to thrash the disk that much and not have the cpu sitting there
waiting for data. Requires careful design.

G98 is a pig on disk IMHO. Then again its using a 'small data set'
I would guess, even scratching out 50 gigs of data. So Im doing 'small
time calculations' I spose, and have designed my cluster for exactly that.
No one in their right mind will argue that anyone can save money by designing
a cluster for EXACTLY whats required.

I realise 12.5Mb/s isnt all that much data I/O, but even at that most software
that requires data at a rate like that is sitting waiting for huge blocks to
be loaded - G98 does this when it writes scratch files - and I've seen *100
GIG* scratch files from it (before we ran out of disk) it SITS AND WAITS for
the disk writing to finish. That's a good argument for running two jobs per
node there, until... you get TWO jobs scratching out to disk at the same time,
sigh. Where does it end? 5 jobs?  Not to mention hopefully having really
efficient UDMA xfers to do all this writing. Suddenly yer beyond the
rate of cheap IDE drives and you've got a 60MB/s raid 0 array for EVERY
node to handle the data rtates.

Suddenly your nodes are costing a hell of alot.

> Diskless clusters have their own uses, especially in applications which don't 
> do parallel I/O.

Exactly. For gromacs and G98 (the former known as a non disk hog, the latter
considered somewhat of a pig) I have no need for local 30Mb/s+ disk for
each node.

> It's fair to say that beowulf systems are capable of doing parallel I/O, thus 
> they need a local disk.

Really depends on the data you have to sift through. There are a large
number of types of application that can do fine with 12.5MB/s of disk. Which
is about the speed of an average IDE drive these days (or perhaps a bit
faster for writign). Even so, you can use NFS on multiple servers instead
as long as you dont need to have every node to have access to any file
written by any other node. If you only need to group your accesses (node n's
files are read only by nodes n-3 thru n+3) then you can also group your
NFS servers similarly.

Sides, even with local disk - what you gonna do for sharing files between
nodes? Let each node talk to eachother node over NFS to get at that local
nodes' files? (I dont want to imagine what the fstabs looklike on each
machine.  Machine generated for sure!)  Shared scsi bus might do better here!
:)

That makes me think - Anyone doing IP over SCSI?

/kc

> 
> Thanks,
> 
> - -- 
> Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
> Comp. Sci. Dept., Bilkent University, Ankara
> www: http://www.cs.bilkent.edu.tr/~erayo
> GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.6 (GNU/Linux)
> Comment: For info see http://www.gnupg.org
> 
> iD8DBQE8Dxy4fAeuFodNU5wRAuifAJ9Tus6tgMM+Gnh6OH6YbIrS0Z/PlACgmRPp
> 97Ozs2BjBMk+Vzlm4zvkP94=
> =mHmA
> -----END PGP SIGNATURE-----

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA