[Beowulf] Single threaded memory bandwidth compared between Nehalem-EP and Westmere-EX

Christopher Samuel samuel at unimelb.edu.au
Mon Oct 21 19:55:04 PDT 2013

Hiya Joe,

On 22/10/13 12:39, Joe Landman wrote:

> Maybe some 'perf stat' output might help as well.  The numbers sound a 
> great deal like something running over a single QPI link, so I thought 
> an affinity issue.

We've been doing more digging, and from what we're reading in comparisons
of Nehalem-EP and Nehalem-EX they report seeing the same sort of numbers
that we're seeing, so it could well be that this is just how it is and
relates to the compromises that Intel took to get the large memory
capacity of these systems.

This puts it best:


# It is interesting to note that single threaded bandwidth is mediocre
# at best: we got only 5GB/s with DDR3-1066. Even the six-core Opteron
# with DDR2-800 can reach over 8GB/s, while the newest Opteron DDR3
# memory controller achieves 9.5GB/s with DDR3-1333, almost twice as
# much as the Xeon 7500 series. The best single-threaded performance
# comes out of the Xeon 5600 memory controller: 12GB/s with DDR3-1333.
# Intel clearly had to sacrifice some bandwidth too to achieve the
# enormous memory capacity (64 slots and 1TB without "extensions"). 

> Hows the ram clocked?

1066MHz (same as the review above, and we measure about the same).

> Is this a very big cache coherent machine, or  nodes of a cluster
> built from NUMA nodes?

Single machine, a dual socket IBM x3690 x5 with a Max5 expansion unit
and a quad socket SGI UV-10, both with E7-8837's.   Same single threaded

I'm pretty happy now that this is a fundamental limit rather than
something we're doing wrong, and given this was the only way for
them to get a 1.5TB node in the first place there's not much they
(or we) can do about it. :-)

 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

