[Beowulf] Haswell as supercomputer microprocessors

Joe Landman landman at scalableinformatics.com
Mon Aug 3 06:37:19 PDT 2015


On 08/03/2015 05:06 AM, Mikhail Kuzminsky wrote:
> New special supercomputer microprocessors (like IBM Power BQC and
> Fujitsu SPARC64 XIfx) have 2**N +2 cores (N=4 for 1st, N=5 for 2nd),
> where 2 last cores are redundant, not for computations, but only for
> other work w/Linux or even for replacing of failed computational core.
>
> Current Intel Haswell E5 v3 may also have 18 = 2**4 +2 cores.  Is there
> some sense to try POWER BQC or SPARC64 XIfx ideas (not exactly), and use
> only 16 Haswell cores for parallel computations ? If the answer is
> "yes", then how to use this way under Linux ?

Its possible to do this with some taskset incantation with cpuset 
filesystem bits (burnt offerings generally not needed).  I don't think 
there are "redundant" cores in the Intel product.

Its left as an exercise to the reader to implement though ...

More seriously, you can do some of this also with cgroups 
https://en.wikipedia.org/wiki/Cgroups which is actually what Docker et 
al. do (in part).

There are many ways to attack this problem.

If you are trying to isolate the OS from the computation, say to reduce 
OS jitter impacts upon processes, you might also like work on setting 
interrupt affinity, as well as start working with memory placement 
directly (to minimize QPI usage).  The issue you will encounter is that 
most of the HPC systems with a single HCA/NIC will require IO to/from a 
remote (in a NUMA sense) node.  Which means going over QPI.  Unless you 
have the Intel Infinipath (or Omnipath ... I am not as up on the new 
naming as I should be) or a multi-rail config set up specifically to put 
one NIC/HCA on each socket.

The point I am trying (subtly) to make here is that you can possibly 
spend more time and effort on optimization here.  The question is (and 
for the above) the relative value of this.  For various codes, OS jitter 
is very important, and you should seek to eliminate it.  For others ... 
not so much.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: landman at scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615


More information about the Beowulf mailing list