[Beowulf] hints for benchmarking on-chip communication
Marcel Meyer
meyerm at in.tum.de
Sun Jul 12 10:41:55 PDT 2009
Hello list,
I want to benchmark on-chip performance of message passing from one process
running on one core to another process running on another core (test setup
would be OpenMPI 1.3.2 with a 4-socket Dunnington, processes will be pinned
to a specific core). I do know about other, more suitable programming
models on such a shared memory system, I really just want to have a look at
MPI.
But I'm a beginner when it comes to benchmarking at that level and wanted to
ask you if you could point me to some "first steps"-docs. Like how to
prevent hardware prefetching getting in the way of measuring the worst-case
performance when sending big arrays (force fetching random locations?), how
to recognize TLB hits/misses in the results, etc.
Currently I'm looking over the source code of the SM-BTL in OpenMPI and will
try to get some scheme of the Dunnington to better understand it's
architecture (still searching ;-) ).
Thank you very much,
Marcel
More information about the Beowulf
mailing list