[Beowulf] bring back 2012?

Wed Aug 17 07:10:57 PDT 2016

Correct me if I'm wrong, but it looks like all these benchmarks are for 
single threaded applications. I don't see any refences to MPI or OpenMP 
or other threading method in the compiler, optimization and invocation 
notes. The only parallelism I see is in the use of AVX2 in the 2016 
results, and some references to SIMD in the 2012 results.

So regardless of the number of cores in each test system, what the 
benchmarks are really comparing is single core performance between a 2.3 
GHz AMD Opteron and a 2.4 GHz. Intel Xeon. Is that correct?

I'm assuming the starting source code is exactly the same in each case, 
too.

If so, those results aren't surprising. Since systems started going 
multi-core, the performance has really come from adding parallelism to 
your programs using threads or message-passing, or taking advantage of 
the larger vector processing capabilities that get added to each 
successful generation of processors. If these benchmarks were rewritten 
to optimize them for data parallelism and to make sure the data was 
properly aligned for the vector registers, I'm sure the newer processor 
would show better performance.

> So glad that we have thousands of Phi's...

I wouldn't be so glad. You're still going to have to rewrite your code 
as mentioned above to get any meaningful performance.

When Intel first started marketing the Xeon Phi, they emphasized that 
you wouldn't need to rewrite your code to use the Xeon Phi. This was a 
marketing moving to differentiate the Xeon Phi from the NVIDIA CUDA 
processors. That may have been a true statement, but it didn't mention 
anything about performance of that existing code, and was, frankly, very 
misleading. The truth is, if you don't rewrite your code, you're not 
going to see much (relatively speaking) of a performance improvement, 
and when you do rewrite your code to optimize it for the Xeon Phi, 
you'll also see amazing speed ups on regular Xeon processors.

I've seen several presentations where speed ups of 5x, 10x, etc., on 
regular Xeons just through optimizing the code to be more thread- and 
vector- friendly. Some improvements were so significant, they make you 
ask if the Xeon Phi was even needed. These are the first gens I'm 
talking about, I imagine the KNL will make a more compelling argument 
for the Phi.

If you pay attention to Intel's marketing and the industry news the past 
couple of years, you will have noticed that Intel has been promoting 
"code modernization" efforts, saying all codes need to be modernized to 
take advantage newer processors, while that is certainly true, "code 
modernization" is just a euphemism for "rewrite your code". This is 
Intel backpedaling on their earlier statements that you don't need to 
rewrite your code to take advantage of a Xeon Phi, without actually 
admitting it.

Prentice

On 08/16/2016 04:35 AM, Stu Midgley wrote:
> https://www.spec.org/cpu2006/results/res2016q2/cpu2006-20160308-39354.html 
> <https://www.spec.org/cpu2006/results/res2016q2/cpu2006-20160308-39354.html>
> https://www.spec.org/cpu2006/results/res2012q4/cpu2006-20121108-25077.html 
> <https://www.spec.org/cpu2006/results/res2012q4/cpu2006-20121108-25077.html>
>
> Its like no progress has been made.  So glad that we have thousands of 
> Phi's...
>
> -- 
> Dr Stuart Midgley
> sdm900 at sdm900.com <mailto:sdm900 at sdm900.com>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20160817/2f3ccdff/attachment-0001.html>