[Beowulf] 512 nodes Myrinet cluster Challanges
Mark Hahn
hahn at physics.mcmaster.ca
Sun Apr 30 09:42:57 PDT 2006
> > By the way, the idea of rolling-your-own hardware on a large cluster, and
> > planning on having a small technical team, makes me shiver in horror. If
> > you go that route, you better have *lots* of experience in clusters. and
> > make very good decisions about cluster components and management methods.
> > If you don't, your users will suffer mightily, which means you will suffer
> > mightily too.
I believe that overstates the case significantly.
some clusters are just plain easy. it's entirely possible to buy a
significant number of conservative compute nodes, toss them onto a generic
switch or two, and run the whole thing for a couple years without any real
effort. I did it, and while I have a lot of experience, I didn't apply any
deep voodoo for the cluster I'm thinking of. it started out with a good
solid login/file/boot server (4U, 6x scsi, dual-xeon 2.4, 1G ram), a single
48pt 100bt (1G up) switch, and 48 dual-xeon nodes (diskful but not
disk-booting). it was a delight to install, maintain and manage.
I originally built it with APC controllable PDUs, but in the process of
moving it, stripped them out as I didn't need them. (I _do_ always require
net-IPMI on anything newly purchased.) I've added more nodes to the cluster
since then - dual-opteron nodes and a couple GE switches.
> For clusters with more than perhaps 16 nodes, or EVEN 32 if you're
> feeling masochistic and inclined to heartache:
with all respect to rgb, I don't think size is a primary factor in cluster
building/maintaining/etc effort. certainly it does eventually become a
concern, but that's primarily a statistical result of MTBF/nnodes. it's
quite possible to choose hardware to maximize MTBF and configuration risk.
in the cluster above, I choose a chassis (AIC) which has a large centrifugal
blower, rather than a bunch of 40mm axial/muffin fans. a much larger cluster
I'm working on now (768 nodes) has 14 40mm muffin fans in each node! while
I know I can rely on the vendor (HP) to replace failures promptly and without
complaint, there's an interesting side-effect: power dissipation. of 12 fans
pointing at the CPUs are actually paired inline, and each pair is rated to
dissipate up to 20W. so a node that idles at 210W and 265W under full load
can easily consume 340W if the fans are ramped up. ouch!
this is probably the most significant size-dependent factor for me. if
you're doing your own 32-node cluster, it's pretty easy to manage the
cooling. the difference between dissipating 300 and 400W is less than
a ton of chiller capacity. scraping up 10-20 additional tons of capacity
is quite a different proposition.
regards, mark hahn.
More information about the Beowulf
mailing list