[Beowulf] bring back 2012?
pbisbal at pppl.gov
Wed Aug 17 07:10:57 PDT 2016
Correct me if I'm wrong, but it looks like all these benchmarks are for
single threaded applications. I don't see any refences to MPI or OpenMP
or other threading method in the compiler, optimization and invocation
notes. The only parallelism I see is in the use of AVX2 in the 2016
results, and some references to SIMD in the 2012 results.
So regardless of the number of cores in each test system, what the
benchmarks are really comparing is single core performance between a 2.3
GHz AMD Opteron and a 2.4 GHz. Intel Xeon. Is that correct?
I'm assuming the starting source code is exactly the same in each case,
If so, those results aren't surprising. Since systems started going
multi-core, the performance has really come from adding parallelism to
your programs using threads or message-passing, or taking advantage of
the larger vector processing capabilities that get added to each
successful generation of processors. If these benchmarks were rewritten
to optimize them for data parallelism and to make sure the data was
properly aligned for the vector registers, I'm sure the newer processor
would show better performance.
> So glad that we have thousands of Phi's...
I wouldn't be so glad. You're still going to have to rewrite your code
as mentioned above to get any meaningful performance.
When Intel first started marketing the Xeon Phi, they emphasized that
you wouldn't need to rewrite your code to use the Xeon Phi. This was a
marketing moving to differentiate the Xeon Phi from the NVIDIA CUDA
processors. That may have been a true statement, but it didn't mention
anything about performance of that existing code, and was, frankly, very
misleading. The truth is, if you don't rewrite your code, you're not
going to see much (relatively speaking) of a performance improvement,
and when you do rewrite your code to optimize it for the Xeon Phi,
you'll also see amazing speed ups on regular Xeon processors.
I've seen several presentations where speed ups of 5x, 10x, etc., on
regular Xeons just through optimizing the code to be more thread- and
vector- friendly. Some improvements were so significant, they make you
ask if the Xeon Phi was even needed. These are the first gens I'm
talking about, I imagine the KNL will make a more compelling argument
for the Phi.
If you pay attention to Intel's marketing and the industry news the past
couple of years, you will have noticed that Intel has been promoting
"code modernization" efforts, saying all codes need to be modernized to
take advantage newer processors, while that is certainly true, "code
modernization" is just a euphemism for "rewrite your code". This is
Intel backpedaling on their earlier statements that you don't need to
rewrite your code to take advantage of a Xeon Phi, without actually
On 08/16/2016 04:35 AM, Stu Midgley wrote:
> Its like no progress has been made. So glad that we have thousands of
> Dr Stuart Midgley
> sdm900 at sdm900.com <mailto:sdm900 at sdm900.com>
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf