[Beowulf] Woodcrest - Shared L2 cache

Renato S. Silva rssr at lncc.br
Wed Aug 16 08:39:31 PDT 2006


Hi Folks

Does anyone have information about  how they shared L2  for both cores ?

Thanks
Renato Silva

Richard Walsh wrote:

> Mark Hahn wrote:
>
>>>> Good point which makes perfect sense to me.
>>>> Given that the theoretical maximum is actually 21.3 GB/s
>>>> the real maximum Triad number must be 21.3/3 = 7.1 GB/s.
>>>
>>
>> I don't get this - triad does two reads and one write.
>> if you don't use store-through ('nt' versions of mov),
>> then the write also implies a read for write-allocate
>> (filling the cache line).
>> without store-through, the peak theoretical number reported by
>> stream should be 3*peak/4.  the 4 is because there are 3r+1w,
>> and the 3 because stream doesn't give credit for write-allocate.
>
> That looks right.  So, one socket, with write allocate, >>should<< show:
>
>      10.5 GB/sec * .75 or 7.875 GBytes/sec
>
> and two sockets 15.75 GBytes/sec.  The problem could be related
> to  competitive/ineffective use of the shared L2 cache or a bottleneck
> in the North bridge.  It would seem that a look at how the performance 
> grows
> as you add cores within versus across sockets should reveal this.
>
> Two cores on separate sockets should show higher numbers if it's
> an L2 cache issue.  If they are the same as those for 2 cores on one
> socket then you have a problem with the North bridge or getting
> full bandwidth from the FB-DIMMs.
> A complication in this test could be that in the one core per socket case
> the whole L2 cache is allocated to a single core.  Watching performance
> change as the array sizes grow should reveal this.
> rbw
>
>
>>
>>> Then how do you explain a dual opteron with two 6.4GB/sec (peak)
>>> memory system, 12.8GB/sec total per node managing 9-10GB/sec?
>>>
>>> 12.8/3=4.26GB/sec.  People are seeing well over twice that.
>>
>>
>> since pathscale does write-through, the peak really should be 12.8,
>> so achieving 9-10 is decent but not paradoxical.  (the peak would 
>> correspond to 1.07 Gflops, significantly below the peak theoretical
>> pipeline rate of 2*clock flops...)
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>




More information about the Beowulf mailing list