[Beowulf] bizarre scaling behavior on a Nehalem

Bruno Coutinho coutinho at dcc.ufmg.br
Tue Aug 11 16:27:55 PDT 2009


2009/8/11 Rahul Nabar <rpnabar at gmail.com>

> On Tue, Aug 11, 2009 at 5:57 PM, Bruno Coutinho<coutinho at dcc.ufmg.br>
> wrote:
> > Nehalem and Barcelona have the following cache architecture:
> >
> > L1 cache: 64KB (32kb data, 32kb instruction), per core
> > L2 cache: Barcelona :512kb, Nehalem: 256kb, per core
> > L3 cache: Barcelona: 2MB, Nehalem: 8MB , shared among all cores.
> >
> >
> > Both in Barcelona and Nehalem, the "uncore" (everything outside a core,
> like
> > L3 and memory controllers) runs at lower speed than the cores and all
> cores
> > communicate through L3, so it must handle some coherence signals too.
> > This makes impossible to L3 feed all cores at full speed if L2 caches
> have
> > big miss ratios.
> >
> > So, what is happening with your program is something like:
> >
> > Working set fits Barcelona 512kb L2 cache, so it has 10% miss rate,
> > but is doesn't fits Nehalem 256km L2 cache, so it has 50% miss rate.
> > So in Nehelem the shared L3 cache has to handle much more requests from
> all
> > cores than Barcelona, becoming a big bottleneck.
>
> Thanks Bruno! That makes a lot of sense now. Assuming that is what is
> happening is there any way of still using the Nehalems fruitfully for
> this code? Any smart tricks / hacks?


You can use profilers that monitor hardware performance counters like
oprofile or papi to measure miss ratios and verify if that is what is
happening. But solving it is a much larger problem. :)



>
>
> The reason is that the Nehalems seem to scale and perform beautifully
> for my other codes.
>
> The only other option is to relapse back to the AMDs. I believe the
> Shanghai would be a choice or an Instanbul. I assume the cache
> structure there is as good as the Barcelona if not better! Any
> experiences with these chips on the group?
>
> Funnily, I haven't heard of any such Nehalem (-ive) stories anywhere
> else. Am I the first one to hit this cache bottleneck? I doubt it. Any
> other cache heavy users?
>
> --
> Rahul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090811/e8709a6f/attachment.html>


More information about the Beowulf mailing list