[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron 275performance]

Wed Jul 13 09:56:09 PDT 2005

Hi Mikhail:

   If you use numactl, you should have control over processor affinity 
for a particular process.  I am not sure how this ties in to MPI though, 
so there may need to be some work there.

Joe

Mikhail Kuzminsky wrote:
> In message from Alan Louis Scheinine <scheinin at crs4.it> (Tue, 12 Jul 
> 2005 12:24:27 +0200):
> 
>>  1) Gerry Creager wrote "Hoowa!"
>>     Since the results seem useful, I would like to add the following.
>>     On dual-CPU boards with Athlon32 CPUs, the program "bolam" was 
>> slow if
>>     both CPUs on the board were used, it was better to have one MPICH 
>> process
>>     per compute node.  This problem did not appear in another cluster 
>> that had
>>     Opteron dual-CPU boards (single-core), that is, two processes for 
>> each node
>>     did not cause a slowdown.  This is an indication that "bolam" is at a
>>     threshold for memory access being a bottleneck. 
> 
> The original post by S.Gorelsky (re-sent by E.Leitl) was about good
> scalability of 4cores/dual-CPUs Opteron 275 server on Gaussian 03 
> DFT/test397 test. I'm testing just now like Supermicro server 
> w/2*Opteron 275 but w/DDR333 instead of DDR400 used by S.Gorelsky.
> I used SuSE 9.0 w/2.4.21 kernel.
> 
> I understood, that original results of S.Gorelsky were obtained, probably,
> for shared memory parallelization ! If I use G03 w/Linda (which
> is main parallelization tool for G03 - parallelization in shared
> memory model of G03 is available only for more restricted subset
> of quantum-chemical methods) - then the results are much more bad.
> 
> On 4 cores I obtained speedup only 2.95 for Linda vs 3.6 for
> shared memory. The difference is, as I understand, simple because
> of data exchanges through RAM for the case of Linda; in shared memory
> model like memory traffic is absent.
> FYI: speedup by S.Gorelsky for 4 CPUs is 3.4 (hope that I calculated
> properly :-)).
> 
> I also obtained similar results for other quantum-chemical methods which 
> show that using of Linda/G03 may give bad scalability for
> dual-core Opteron.
> We also have some (developing by us) quantum-chemical application which
> is bandwidth-limited under parallelization, and using of 1 CPU (1 MPI 
> process) per dual Xeon nodes for Myrinet/MPICH is strongly preferred. In 
> the case of (dual single core CPUs)-Opteron nodes the situation is better.
> 
> But now for 4cores/2CPUs per Opteron node to force the using of
> only 2 cores (from 4), by 1 for each chip, we'll need to have
> cpu affinity support in Linux.
> 
> Yours
> Mikhail
> 
>> A complication for this
>>     interpretation is that the Athlon32 nodes use Linux kernel 2.4.21.
>>  2) Mikhail Kuzminsky asked "do you have "node interleave memory" 
>> switched off?
>>     Reading the BIOS:
>>     Bank interleaving "Auto", there are two memory modules per CPU so 
>> there
>>        should be bank interleaving.
>>     Node interleaving "Disable"
>>  3) In an email Guy Coates asked
>>     > Did you need to use numa-tools to specify the CPU placement, or 
>> did the
>>     > kernel "do the right thing" by itself?
>>     The kernel did the right thing by itself.
>>     I have a question: what are numa-tools?
>>     On the computer I find
>>     man -k numa
>>        numa   (3)  - NUMA policy library
>>        numactl(8)  - Control NUMA policy for processes or shared memory
>>     rpm -qa | grep -i numa
>>        numactl-0.6.4-1.13
>>     Is numactl the "numa-tools"?  Is there another package to consider 
>> installing?
>>     I see that numactl has many "man" pages.
>>
>> Reference, previous message:
>> >In all cases, 4 MPI processes on a machine with 4 cores (two 
>> dual-core CPUs).
>> >Meteorology program 1, "bolam"    CPU time, real time (in seconds)
>> >      Linux kernel 2.6.9-11.ELsmp     122        128
>> >      Linux kernel 2.6.12.2            64         77
>> >
>> >Meteorology program 2, "non-hydrostatic"
>> >      Linux kernel 2.6.9-11.ELsmp     598        544
>> >      Linux kernel 2.6.12.2           430        476
>>
>>
>> -- 
>>
>>  Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
>>  Center for Advanced Studies, Research, and Development in Sardinia
>>
>>  Postal Address:               |  Physical Address for FedEx, UPS, DHL:
>>  ---------------               | -------------------------------------
>>  Alan Scheinine                |  Alan Scheinine
>>  c/o CRS4                      |  c/o CRS4
>>  C.P. n. 25                    |  Loc. Pixina Manna Edificio 1
>>  09010 Pula (Cagliari), Italy  |  09010 Pula (Cagliari), Italy
>>
>>  Email: scheinin at crs4.it
>>
>>  Phone: 070 9250 238  [+39 070 9250 238]
>>  Fax:   070 9250 216 or 220  [+39 070 9250 216 or +39 070 9250 220]
>>  Operator at reception: 070 9250 1  [+39 070 9250 1]
>>  Mobile phone: 347 7990472  [+39 347 7990472]
>>
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf