[Beowulf] Woodcrest Memory bandwidth
Kozin, I (Igor)
i.kozin at dl.ac.uk
Tue Aug 15 10:57:46 PDT 2006
Interesting...
Given that Add and Triad are virtually the same
it's surprising that Copy and Scale are so different.
IMHO Scale should be more like Copy. Compiler effect?
> here you go (dell 2950 with 8 modules and streams compiled with icc-9.1 -O3:
>
> [root at tbox3 streamd]# hostname ; date ; for i in 1 2 3 4 5 ;
> do export
> OMP_NUM_THREADS=$i ; ./streamd | egrep "Total memory
> re|Number of Th|Function
> |Copy:|Scale:|Add:|Triad:"; done
> tbox3
> Fri Aug 11 17:59:22 CEST 2006
> Total memory required = 457.8 MB.
> Number of Threads requested = 1
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 3945.5494 0.0812 0.0811 0.0813
> Scale: 2914.9758 0.1098 0.1098 0.1099
> Add: 3227.5618 0.1488 0.1487 0.1489
> Triad: 3219.5307 0.1492 0.1491 0.1493
> Total memory required = 457.8 MB.
> Number of Threads requested = 2
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 4324.2058 0.0741 0.0740 0.0742
> Scale: 2999.9626 0.1068 0.1067 0.1069
> Add: 3309.2733 0.1451 0.1450 0.1452
> Triad: 3309.7031 0.1451 0.1450 0.1452
> Total memory required = 457.8 MB.
> Number of Threads requested = 3
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 5422.5441 0.0590 0.0590 0.0590
> Scale: 4102.8364 0.0780 0.0780 0.0781
> Add: 4487.2464 0.1070 0.1070 0.1070
> Triad: 4487.7465 0.1070 0.1070 0.1070
> Total memory required = 457.8 MB.
> Number of Threads requested = 4
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 6023.2969 0.0532 0.0531 0.0533
> Scale: 4862.4855 0.0658 0.0658 0.0659
> Add: 5264.1973 0.0912 0.0912 0.0913
> Triad: 5268.1782 0.0911 0.0911 0.0911
> Total memory required = 457.8 MB.
> Number of Threads requested = 5
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 5504.9004 0.0582 0.0581 0.0582
> Scale: 4318.9044 0.0786 0.0741 0.1147
> Add: 4705.1016 0.1042 0.1020 0.1216
> Triad: 4705.2885 0.1038 0.1020 0.1184
>
> > Two cores on separate sockets should show higher numbers if it's
> > an L2 cache issue. If they are the same as those for 2 cores on one
> > socket then you have a problem with the North bridge or getting
> > full bandwidth from the FB-DIMMs.
> >
> > A complication in this test could be that in the one core
> per socket case
> > the whole L2 cache is allocated to a single core. Watching
> performance
> > change as the array sizes grow should reveal this.
> >
> > rbw
>
More information about the Beowulf
mailing list