[Beowulf] Fault tolerance & scaling up clusters (was Re: Bright Cluster Manager)
Roland Fehrenbacher
rf at q-leap.de
Fri May 18 01:36:29 PDT 2018
>>>>> "JH" == John Hearns via Beowulf <beowulf at beowulf.org> writes:
JH> Roland, the OpenHPC integration IS interesting. I am on the
JH> OpenHPC list and look forward to the announcement there.
Yes, we'll post there when ready.
JH> On 17 May 2018 at 15:00, "R" = Roland Fehrenbacher <rf at q-leap.de>
JH> wrote:
>>>>> "J" == Lux, Jim (337K) <james.p.lux at jpl.nasa.gov> writes:
J> The reason I hadn't looked at "diskless boot from a
J> server" is the size of the image - assume you don't have a high
J> bandwidth or reliable link.
R> This is not something to worry about with Qlustar. A (compressed)
R> Qlustar 10.0 image containing e.g. the core OS + slurm + OFED +
R> Lustre is just a mere 165MB to be transferred (eating 420MB of
R> RAM when unpacked as the OS on the node) from the head to a
R> node. Qlustar (and its non-public ancestors) were never using
R> anything but RAMDisks (with real disks for scratch), the first
R> cluster running this at the end of 2001 was on Athlons ... and
R> eaten-up RAM in the range of 100MB still mattered a lot at that
R> time :)
R> So over the years, we perfected our image build mechanism to
R> achieve a close to minimal (size-wise) OS, minimal in the sense
R> of: Given required functionality (wanted kernel modules,
R> services, binaries/scripts, libs), generate an image (module) of
R> minimal size providing it. That is maximum light-weight by
R> definition.
R> Yes, I know, you'll probably say "well, but it's just Ubuntu
R> ...". Not for much longer though: CentOS support (incl. OpenHPC
R> integration) coming very soon ... And all Open-Source and free.
R> Best,
R> Roland
R> ------- https://www.q-leap.com / https://qlustar.com
R> --- HPC / Storage / Cloud Linux Cluster OS ---
More information about the Beowulf
mailing list