[Beowulf] Nehalem memory configs
Tom Elken
tom.elken at qlogic.com
Mon Apr 13 12:02:28 PDT 2009
> On Behalf Of Joe Landman
>
> Since the part is released, I can report a stream test :)
And so can I :-) (below)
>
> richard.walsh at comcast.net wrote:
>
> > 64 GB/sec is the right dual-socket theoretical number for this
> > situation, and Intel
> > presents the value of 33 GB/sec for the stream triad for the dual
> > socket boards,
> >
> > so 35 GB/sec could be a copy perhaps, but nothing was mentioned about
> > any benchmark in the memory piece.
The STREAM benchmark was mentioned in the delltechcenter piece, but which sub-benchmark (Triad or Copy, etc.) was not.
Here's some results we got on a Nehalem system with Dual
Intel Xeon W5580 @ 3.20GHz CPUs,
6x 2GB DDR3-1333 dimms (one per memory channel),
and SMT turned off,
where all 4 STREAM components are over 37 GB/s when run on 8 threads over two CPUs:
------------------
OpenMP (8 threads)
Intel 11.0, icc -O3 -openmp -static
Array size = 32000000, Offset = 0
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 38705.2547 0.0134 0.0132 0.0135
Scale: 37735.3959 0.0137 0.0136 0.0138
Add: 37293.9249 0.0207 0.0206 0.0209
Triad: 37388.7235 0.0207 0.0205 0.0209
Serial
Intel 11.0, icc -O3 -static
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 10781.6770 0.0475 0.0475 0.0475
Scale: 10080.7104 0.0508 0.0508 0.0508
Add: 12646.7882 0.0608 0.0607 0.0608
Triad: 12628.8395 0.0608 0.0608 0.0608
-------------------
The 3.2 GHz, W5580 part is for workstations. We'll remeasure when we get some servers with somewhat slower CPUs, but I would not expect a big difference from the above.
-Tom Elken
> In any case, I think we have the
> > right theoretical
> >
> > and probable real-world numbers expressed here, if people were
> > wondering.
>
> 2-socket Intel MB with 2 dual core (not quad core) Nehalem E5502 1.8
> GHz
> processors, running stream omp (I bumped N way up to get a reasonable
> measurement).
>
> landman at velocibunny:~/stream$ ./stream_c_omp.exe
> -------------------------------------------------------------
> STREAM version $Revision: 5.8 $
> -------------------------------------------------------------
> This system uses 8 bytes per DOUBLE PRECISION word.
> -------------------------------------------------------------
> Array size = 200000000, Offset = 0
> Total memory required = 4577.6 MB.
> Each test is run 10 times, but only
> the *best* time for each is used.
> -------------------------------------------------------------
> Number of Threads requested = 4
> -------------------------------------------------------------
> Printing one line per active thread....
> Printing one line per active thread....
> Printing one line per active thread....
> Printing one line per active thread....
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 1 microseconds.
> Each test below will take on the order of 130623 microseconds.
> (= 130623 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 16545.0680 0.1942 0.1934 0.1958
> Scale: 16098.2714 0.1996 0.1988 0.2019
> Add: 17929.8514 0.2684 0.2677 0.2697
> Triad: 17682.8117 0.2719 0.2715 0.2722
> -------------------------------------------------------------
> Solution Validates
> -------------------------------------------------------------
>
> and for laughs, same test run (with same binary) on Shanghai 2.3 GHz
> (2376) with OMP_NUM_THREADS=4
>
>
> landman at pegasus-a3g:~/stream$ ./stream_c_omp.exe
> -------------------------------------------------------------
> STREAM version $Revision: 5.8 $
> -------------------------------------------------------------
> This system uses 8 bytes per DOUBLE PRECISION word.
> -------------------------------------------------------------
> Array size = 200000000, Offset = 0
> Total memory required = 4577.6 MB.
> Each test is run 10 times, but only
> the *best* time for each is used.
> -------------------------------------------------------------
> Number of Threads requested = 4
> -------------------------------------------------------------
> Printing one line per active thread....
> Printing one line per active thread....
> Printing one line per active thread....
> Printing one line per active thread....
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 1 microseconds.
> Each test below will take on the order of 210029 microseconds.
> (= 210029 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 10885.6547 0.2943 0.2940 0.2946
> Scale: 10966.1188 0.2923 0.2918 0.2929
> Add: 12019.7420 0.4002 0.3993 0.4012
> Triad: 12127.1875 0.3965 0.3958 0.3968
> -------------------------------------------------------------
> Solution Validates
> -------------------------------------------------------------
>
> I suspect we have the pegasus memory in a non-optimal config, will look
> later on next week.
>
> Assuming we can get a pair of quad core Nehalem units into our test
> machine, it appears that 32 GB/s on stream is quite possible. Right
> now
> it looks like ~4 GB/s per thread.
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web : http://www.scalableinformatics.com
> http://jackrabbit.scalableinformatics.com
> phone: +1 734 786 8423 x121
> fax : +1 866 888 3112
> cell : +1 734 612 4615
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list