[Beowulf] Intel Phi musings
rbwcnslt at gmail.com
Tue Feb 12 08:38:01 PST 2013
Thanks for your answer ...
That sounds compelling. May I ask a few more questions?
So should I assume that this was a threaded SMP type application
(OpenMP, pthreads) or it is MPI based? Is the supporting CPU of the
multi-core Sandy Bridge vintage? Have you been able to compare
the hyper-threaded, multi-core scaling on that Sandy Bridge side of the
system with that on the Phi (fewer cores to compare of course). Using the
Intel compilers I assume ... how well do your kernels vectorize? Curious
about the observed benefits of hyper-threading, which generally offers
little to floating-point intensive HPC computations where functional unit
collision is an issue. You said you have 2 Phis per node. Were you
running a single job across both? Were the Phis in separate PCIE
slots or on the same card (sorry I should know this, but I have just
started looking at Phi). If they are on separate cards in separate
slots can I assume that I am limited to MPI parallel implementations
when using both.
Maybe that is more than a few questions ... ;-) ...
Thrashing River Consulting
On Tue, Feb 12, 2013 at 10:46 AM, Dr Stuart Midgley <sdm900 at gmail.com>wrote:
> It was simple really. Within 1hr, I had recompiled a large amount of our
> codes to run on the phi's and then ssh'ed to the Phi and ran them… Saw that
> a single phi was faster than our current 4 socket AMD 6276 (64 cores) and
> then ordered machines with 2 phi's in them :)
> I didn't bother with any of the compiler directives etc… just treated them
> like a 240core (hyper threaded) computer… and saw great scaling.
> Dr Stuart Midgley
> sdm900 at sdm900.com
> On 12/02/2013, at 11:12 PM, Richard Walsh <rbwcnslt at gmail.com> wrote:
> > Hey Stuart,
> > I am interested in what sold you on the Phi. My cursory look
> > suggested that using the Phi in Intel's offload mode (which
> > preserves the scalar performance) was not much easier to
> > program than writing in CUDA ... and that using the Phi as
> > a standalone processor while a programming convenience
> > suffers on scalar code. Even that programming convenience
> > is limited by the fact that you have to think both in terms of
> > vectors and threads.
> > Also, the speed ups I have seen generally seem modest,
> > understanding that GPU performance hype is exaggerated.
> > Hearing what you like would be interesting.
> > Thanks,
> > Richard Walsh
> > Thrashing River Consulting
> > On Tue, Feb 12, 2013 at 10:02 AM, Dr Stuart Midgley <sdm900 at gmail.com>
> > I've started a blog to document the process I'm going through to get our
> Phi's going.
> > http://phi-musings.blogspot.com.au
> > Its very sparse at the moment, but will get filled in a lot over the
> next day or so… I've finally got them booting.
> > FYI we currently have 100 co-processors and should have the next 160 or
> so in a few weeks.
> > --
> > Dr Stuart Midgley
> > sdm900 at sdm900.com
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf