[Beowulf] Woodcrest Memory bandwidth

Kozin, I (Igor) i.kozin at dl.ac.uk
Tue Aug 15 10:57:46 PDT 2006


Interesting...
Given that Add and Triad are virtually the same
it's surprising that Copy and Scale are so different.
IMHO Scale should be more like Copy. Compiler effect?


> here you go (dell 2950 with 8 modules and streams compiled with icc-9.1 -O3:
>
> [root at tbox3 streamd]# hostname ; date ; for i in 1 2 3 4 5 ; 
> do export 
> OMP_NUM_THREADS=$i ; ./streamd | egrep "Total memory 
> re|Number of Th|Function 
> |Copy:|Scale:|Add:|Triad:"; done
> tbox3
> Fri Aug 11 17:59:22 CEST 2006
> Total memory required = 457.8 MB.
> Number of Threads requested = 1
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        3945.5494       0.0812       0.0811       0.0813
> Scale:       2914.9758       0.1098       0.1098       0.1099
> Add:         3227.5618       0.1488       0.1487       0.1489
> Triad:       3219.5307       0.1492       0.1491       0.1493
> Total memory required = 457.8 MB.
> Number of Threads requested = 2
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        4324.2058       0.0741       0.0740       0.0742
> Scale:       2999.9626       0.1068       0.1067       0.1069
> Add:         3309.2733       0.1451       0.1450       0.1452
> Triad:       3309.7031       0.1451       0.1450       0.1452
> Total memory required = 457.8 MB.
> Number of Threads requested = 3
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        5422.5441       0.0590       0.0590       0.0590
> Scale:       4102.8364       0.0780       0.0780       0.0781
> Add:         4487.2464       0.1070       0.1070       0.1070
> Triad:       4487.7465       0.1070       0.1070       0.1070
> Total memory required = 457.8 MB.
> Number of Threads requested = 4
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        6023.2969       0.0532       0.0531       0.0533
> Scale:       4862.4855       0.0658       0.0658       0.0659
> Add:         5264.1973       0.0912       0.0912       0.0913
> Triad:       5268.1782       0.0911       0.0911       0.0911
> Total memory required = 457.8 MB.
> Number of Threads requested = 5
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        5504.9004       0.0582       0.0581       0.0582
> Scale:       4318.9044       0.0786       0.0741       0.1147
> Add:         4705.1016       0.1042       0.1020       0.1216
> Triad:       4705.2885       0.1038       0.1020       0.1184
> 
> > Two cores on separate sockets should show higher numbers if it's
> > an L2 cache issue.  If they are the same as those for 2 cores on one
> > socket then you have a problem with the North bridge or getting
> > full bandwidth from the FB-DIMMs.
> >
> > A complication in this test could be that in the one core 
> per socket case
> > the whole L2 cache is allocated to a single core.  Watching 
> performance
> > change as the array sizes grow should reveal this.
> >
> > rbw
> 




More information about the Beowulf mailing list