[Beowulf] shared memory versus MPI and bootless boot

Thu Jun 29 09:58:25 PDT 2006

Subject: Re: [Beowulf] shared memory versus MPI and bootless boot

> > Mark Hahn wrote:
> >
> > > > btw does that 'boot over network means i need a 16 node hub for
> 100 mbit and connect all the machines besides the quadrics network also
> to 100 mbit?
> > >sure, you need some sort of ethernet.  doesn't have to be a 16pt
> switch(please don't say that you actually have a hub!)
> > Heh, heh, heh..
> > I have a box of Artisoft 2Mbps NICs out in the garage.
> > Or, maybe, some of those NE1000 coax adapters.  I have lots of old
> > coax, a bag full of connectors, a crimper, and I'm not afraid to use
>  > them.
> >
> > Hey, it's only to boot.
> assuming that you boot up to 4 at a time... for power distribution and
> that, how much does booting over the network require...
> if your ram disk or local storage is 128MB, over a 2Mbps connect, i make
> it 8 mins, nearer 9 to load which is... urm slow.
> again this for a 100Mbps hub you would get 10.24 seconds per node.

Ah thanks for noticing this, hadn't realized this timing problem yet!

But yes i have 100mbit hubs here, not switches :)
Intel InBusiness hubs actually.

Note over my 100Mbit hubs i've *never* ever managed to upload faster than 
to other machines when the rest of the network is *completely* idle.

When just a little bit happens it drops to like 1 to 2 MB/s

That means effectively 128MB bootless scratch will take a minute or 2 a node 
or so.

This is a rather good tip though to take into account.

So i should move to gigabit switch in case of diskless boot and all nodes 
which only have 100mbit ports,
put a tiny disk with powersavings at the machine.

This is a rather important point to take into account.

Advantage of gigabit network is of course that the master node you can equip 
it with some capable
raid10 array and use the gigabit network to fullfill i/o requests.

It's time to measure with a cross-utp cable the bandwidth i can put through 
over tcp/ip at
gigabit nics in the machines here.

> so assuming that you dont have any more than one node failing every 11
> seconds.. a 100Mbps hub would do adequately.
