[Beowulf] hpl size problems
Mark Hahn
hahn at physics.mcmaster.ca
Mon Sep 26 12:20:00 PDT 2005
> Warewulf by default creates the virtual node file system to be extremely
> minimal yet fully functional and tuned for the job at hand (which exists
> in a hybrid RAM/NFS file system).
but HPL does very little IO and runs few commands.
> The nodes are lightweight in both file
> system and process load (context switching and cache management can be
> expensive especially on a non-NUMA SMP systems with lots of cache). The
> more daemons and extra processes that are running, the higher the
> process load and context switching that must occur.
it's hard to guess since we don't know what you were running before.
the only way I can imagine this (random procs) mattering is if you
were running a full desktop install before, and had some polling daemons
running. (magicdev, artsd, etc).
on my favorite cluster, I use the obvious kind of initrd+tmpfs+NFS
and don't run any extra daemons. on a randomly chosen node running
two MPI workers (out of 64 in the job), "vmstat 10" looks like this:
[hahn at node1 hahn]$ ssh node70 vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 2106876 58240 11504 1309148 0 0 0 0 1035 54 99 1 0 0
2 0 2106876 58312 11504 1309148 0 0 0 0 1037 59 99 1 0 0
2 0 2106876 58312 11504 1309148 0 0 0 0 1034 55 99 1 0 0
2 0 2106876 58312 11504 1309148 0 0 0 0 1033 56 99 1 0 0
2 0 2106876 58312 11504 1309148 0 0 0 0 1034 44 99 1 0 0
2 0 2106876 58312 11504 1309148 0 0 0 0 1031 41 99 1 0 0
2 0 2106876 58312 11504 1309148 0 0 0 0 1033 39 99 1 0 0
I haven't updated the kernel to a lower HZ yet, but will soon. I assert
without the faintest whisp of proof that 50 cs/sec is inconsequential.
the gigabit on these nodes is certainly not sterile either - plenty of NFS
traffic, even some NTP broadcasts. actually, I just tcpdumped it a bit,
and the basal net rate is an arp, 4ish NFS access/getattr calls every 60 seconds.
> It reminds me of chapter 1 of sysadmin 101: Only install what you *need*
sure, but that's not inherent to your system, and unless you had some pretty
godaweful stuff installed before, it's hard to see that explanation...
> If someone else also has thoughts as to what would have caused the
> speedup, I would be very interested.
a full-fledged desktop load doesn't cause *that* much extraneous load -
yes, there are interrupts and the like, but you have to remember that
modern machines have massive memory bandwidth, big, associative caches,
and such stuff doesn't matter much.
especially for HPL - it's not exactly tightly-coupled, is it? if it were
(ie, MANY global collectives per second), then I could easily buy the
explanation that removal of random daemons would help a lot. after all,
this has been known for a long time (though generally only on very large
clusters).
> > > hours) running on Centos-3.5 and saw a pretty amazing speedup of the
> > > scientific code (*over* 30% faster runtimes) then with the previous
> > > RedHat/Rocks build. Warewulf also makes the cluster rather trivial to
> >
> > such a speedup is indeed impressive; what changed?
>
> Actually, we used the same kernel (recompiled from RHEL), and exactly the
> same compilers, mpi and IB (literally the same RPMS). The only thing
> that changed was the cluster management paradigm. The tests were done
> back to back with no hardware changes.
afaik, recompiling a distro kernel does generally not get you
the same binary as what the distro distributes ;)
regards, mark hahn.
More information about the Beowulf
mailing list