[Beowulf] 512 nodes Myrinet cluster Challanges
Mark Hahn
hahn at physics.mcmaster.ca
Tue May 2 14:20:26 PDT 2006
> > moving it, stripped them out as I didn't need them. (I _do_ always require
> > net-IPMI on anything newly purchased.) I've added more nodes to the cluster
>
> Net-IPMI on all hardware? Why? Running a second (or 3rd) network isn't
> a trivial amount of additional complexity, cables, or cost. What do
I really like being able to reset remotely, as well as power up/down,
fetch temperatures and fan speeds, etc.
> you figure you pay extra on the nodes (many vendors charge to add IPMI,
> sun, tyan, supermicro, etc), cables, switches, etc. As a data point on
> a x2100 I bought recently the IPMI card was $150.
the IPMI add-in for many Tyan boards is a lot less than that ($50?),
but quite a few servers already have it. (such as the HP DL145 G2).
and it's not a "real" nother network, since each rack's worth of IPMI
net ports can just go to an in-rack switch. if you have 32-40 nodes/rack
with a better-than-ethernet interconnect, then you've probably already
got another switch (gigabit) in the rack so all the extra stuff is in-rack.
> Seems like collecting fan speeds and temperatures in-band seems reasonable,
> after all much of the data you want to collect isn't available via IPMI
> anyways (cpu utilization, memory, disk I/O, etc.).
true. though it's not clear to me how important those extras are to
the kind of HPC cluster I run. a job gets complete ownership of its
CPUs (and usually multiple whole nodes), so it's quite unlike a
load-balancing cluster, where you actually want realtime info on
cpu or memory utilization. doing load-balanced clusters is not
unreasonable for more cores-per-node, or perhaps for strictly
serial workloads. for anything that's nontrivially parallel, the job
_must_ completely own all its resources, so there's really no reason
to worry about unused memory on an already occupied node...
> Upgrading a 208 3phase PDU to a switched PDU seems like it costs on the
> order of $30 per node list. As a side benefit you get easy to query
> load per phase.
that's nice. but it only lets you power up/down. you can't do a
warm reset, only hard ones that limit your life.
> After dealing with a few clusters with PDUs in the airflow blocking
> airflow and physical access to parts of the node I now specify the
> zero-u variety that are outside the airflow.
that's nice. HP's PDUs have a breaker section which consume about
1u each, and a set of outlet bars which mount zero-u (but which
have far too many (or too low-power) outlets.
interestingly, our racks are bayed together, which means that there's
enough space for some airflow between racks. unfortunately, Quadrics
switches are fairly narrow, so there's enough room for a noticable
counter-circulation.
More information about the Beowulf
mailing list