[Beowulf] hints for benchmarking on-chip communication

Marcel Meyer meyerm at in.tum.de
Sun Jul 12 10:41:55 PDT 2009

Hello list,

I want to benchmark on-chip performance of message passing from one process 
running on one core to another process running on another core (test setup 
would be OpenMPI 1.3.2 with a 4-socket Dunnington, processes will be pinned 
to a specific core). I do know about other, more suitable programming 
models on such a shared memory system, I really just want to have a look at 

But I'm a beginner when it comes to benchmarking at that level and wanted to 
ask you if you could point me to some "first steps"-docs. Like how to 
prevent hardware prefetching getting in the way of measuring the worst-case 
performance when sending big arrays (force fetching random locations?), how 
to recognize TLB hits/misses in the results, etc.

Currently I'm looking over the source code of the SM-BTL in OpenMPI and will 
try to get some scheme of the Dunnington to better understand it's 
architecture (still searching ;-) ).

Thank you very much,

More information about the Beowulf mailing list