[Beowulf] Fault tolerance & scaling up clusters (was Re: Bright Cluster Manager)

Lux, Jim (337K) james.p.lux at jpl.nasa.gov
Fri May 18 09:49:28 PDT 2018

On May 17, 2018, at 06:01, Roland Fehrenbacher <rf at q-leap.de> wrote:

>>>>>> "J" == Lux, Jim (337K) <james.p.lux at jpl.nasa.gov> writes:
>    J> The reason I hadn't looked at "diskless boot from a
>    J> server" is the size of the image - assume you don't have a high
>    J> bandwidth or reliable link.
> This is not something to worry about with Qlustar. A (compressed)
> Qlustar 10.0 image containing e.g. the core OS + slurm + OFED + Lustre is
> just a mere 165MB to be transferred (eating 420MB of RAM 

165 MB = 1.3 Gbit
At 64 kbps that's about 6 hrs. 

