[Beowulf] hpl size problems
Joe Landman
landman at scalableinformatics.com
Tue Sep 27 07:50:29 PDT 2005
Greg M. Kurtzer wrote:
> On Sat, Sep 24, 2005 at 12:10:46PM -0400, Mark Hahn wrote:
[...]
>>> hours) running on Centos-3.5 and saw a pretty amazing speedup of the
>>> scientific code (*over* 30% faster runtimes) then with the previous
>>> RedHat/Rocks build. Warewulf also makes the cluster rather trivial to
>> such a speedup is indeed impressive; what changed?
>
> Actually, we used the same kernel (recompiled from RHEL), and exactly the
> same compilers, mpi and IB (literally the same RPMS). The only thing
> that changed was the cluster management paradigm. The tests were done
> back to back with no hardware changes.
If these were NUMA machines (Opterons specifically), you need to
worry/watch for processor affinity issues. You can get streams like
programs hopping from CPU to CPU, which results in using the HT path +
memory controller on remote CPU as well as the memory controller on the
local CPU. We have seen 30ish% performance differences between the two
(on memory latency/bandwidth bound codes running multiple threads on the
NUMA).
We also see benchmark cases where the memory system was improperly set
up or configured. Most of these are due to lack of readily available
information (if your goal is to compare realistic performance of real
codes the way people will run them, you don't start out with a
mis-configured system, say with all the memory on the Opteron system
tied to one CPU ... seen that one quite a bit ... ).
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list