[Beowulf] Re: dual core (latency)
Vincent Diepeveen
diep at xs4all.nl
Mon Jul 18 19:50:09 PDT 2005
Hello Stuart,
Thanks for your answer regarding numactl tools.
Your answer doesn't necessarily explain why the dual core latency (with or
without numactl) is far worse, yes 30%+ worse, than that of single cpu
opterons of the same speed, when benchmarking just 1 core (so the others
sitting idle).
Any thoughts on that?
Thanks,
Vincent
At 08:17 AM 7/19/2005 +0800, Stuart Midgley wrote:
>The numactl tools won't generally help latency. Latency isn't the
>issue with Opteron based systems (or any system with multiply
>connected distributed memory controllers).
>
>The real issue is page locality (which is the case with most numa
>based systems).
>
>If you run 2 processes on a dual cpu (single core) systems and they
>both happen to allocate their pages on the same memory controller,
>they will each only see 1/2 the memory bandwidth and 1 controller
>sits idle. That's the real issue (and the extreme pathalogical case).
>
>Linux2.6 generally does a good job of putting the pages on the memory
>controller attached to cpu that the process is running on. However,
>it can't get it perfect. There are always more than 1process/cpu on
>a system, so there is always a little noise... so there is always the
>chance that some pages can be spread around. Also, the system buffer
>cache will get spread around effecting everyone.
>
>Add into the mix the possibility of suspending processes and you can
>end up with a processes pages all over the place. Since Linux
>doesn't yet have make migration, once a page is allocated it won't be
>moved to a different memory controller unless it is swapped out.
>
>With numactl tools you will force the pages to be allocated on the
>right memory/cpu. The processes buffer cache will also be locked
>down (which is another VERY important issue)...
>
>I have used numa tools to double the performance of some codes (or
>perhaps its more correct to say to get back to the correct performance).
>
>Stu.
>
>
>On 18/07/2005, at 22:38, Vincent Diepeveen wrote:
>
>> I've been toying some with the numactl at dual core and it doesn't
>> really seem to help much. It helps 0.00
>>
>> System: Ubuntu at a quad opteron dual core 1.8Ghz 2.6.10-5 smp
>> kernel.
>>
>> Latencies as measured by my own program (TLB trashing read of 8 bytes,
>> each cpu 250MB buffer):
>>
>> #cpu latency
>> 1 144-147 ns
>> 2 174 ns
>> 4 206 ns
>> 8 234 ns
>>
>> That single cpu figure is pretty ugly bad if i may say so.
>>
>> All kind of numa calls just didn't help a thing. I've tried for
>> example:
>>
>> if(numa_available() < 0 ) {
>> setitnuma = 0;
>> }
>> else {
>> int i,back;
>> nodemask_t nt,n2,rnm;
>> maxnodes = numa_max_node()+1; // () returns 3 when 4 controllers
>> printf("numa=%i maxnodes=%i\n",setitnuma,maxnodes);
>>
>> nt = numa_get_interleave_mask();
>> for( i = 0 ; i < maxnodes ; i++ ) {
>> printf("node = %i mask = %i\n",i,nt.n[i]);
>> nt.n[i] = 0;
>> n2.n[i] = 0;
>> }
>> numa_set_interleave_mask(&nt);
>> nt = numa_get_interleave_mask();
>> for( i = 0 ; i < maxnodes ; i++ )
>> printf("checking memory interleave node = %i mask = %i
>> \n",i,nt.n[i]);
>>
>> rnm = numa_get_run_node_mask();
>> printf("numa get run node mask = %i\n",rnm);
>> back = numa_run_on_node(0);
>> if( !back )
>> printf("set to run on node 0\n");
>> else
>> printf("failed to set run on node 0\n");
>>
>> }
>>
>> Whatever i try, single cpu latency keeps 144-147 ns.
>>
>> A dual opteron dual core with 2.2Ghz dual core controllers shows
>> similar
>> latencies. 200 ns for example when running 4 processes with the same
>> testprogram.
>>
>> This single cpu latency behaviour of dual core opteron is ugly bad
>> compared to other dual opterons which are not dual core.
>>
>> Nearly identical Tyan mainboard with dual opteron 2.2Ghz gives
>> single cpu
>> with SAME kernel, with SAME program 115 ns latency. When turning
>> off ECC at
>> that dual opteron it gets down to 113 ns even.
>>
>> The frustrating thing is, the dual opteron 2.2Ghz has pc2700,
>> whereas the quad opteorn dual core has all banks filled
>> with pc3200 registered ram, a-brand.
>>
>> Vincent
>
>
>--
>Dr Stuart Midgley
>sdm900 at gmail.com
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>
More information about the Beowulf
mailing list