[Beowulf] multi-threading vs. MPI

Sat Dec 8 09:51:48 PST 2007

Don,

You asked:  
"...  AMD X2 5400+ in the Master node (dual core) and three AMD X2 4000+
dual core processors enclosed in inexpensive boxes. .... I believe, some
hyperthreading between the dual cores so what is the story about how the
dual cores can be addressed individually but still have hyperthreading
between the dual cores?"

There is no hyperthreading (hardware threading) in AMD CPUs.  Each core
appears like a separate CPU to the Operating System and is treated as
such by MPI libraries.   So you can happily use your Microwulf running
two MPI processes per node with good performance.  The thrust of the
discussion is that, for the average user, you can ignore software
threading between the cores of a node, just use MPI to obtain good
parallel speed-ups.

-Tom

________________________________

	From: beowulf-bounces at beowulf.org
[mailto:beowulf-bounces at beowulf.org] On Behalf Of Donald Shillady
	Sent: Friday, December 07, 2007 4:23 PM
	To: richard.walsh at comcast.net; Toon Knapen; BeowulfMailing List
	Subject: RE: [Beowulf] multi-threading vs. MPI

	This is a very interesting discussion to me.  I have started to
purchase components for an 8 core microWulf based on the Calvin College
microWulf constructed by Prof. Joel Adams and his student except I will
use slightly faster cores with an AMD X2 5400+ in the Master node (dual
core) and three AMD X2 4000+ dual core processors enclosed in
inexpensive boxes.  The Master node has an MSI K9N SLI Platinum
motherboard which has two Gigabit ports so perhaps the initial
configuration with three satellite dual core CPU can be extended to a
second set of boxes later.  All these AM2-socket CPU are dual core and
apparently Prof. Adams was able to address them in the microWulf as
individual cores but there is, I believe, some hyperthreading between
the dual cores so what is the story about how the dual cores can be
addressed individually but still have hyperthreading between the dual
cores?  I am an experienced programmer for Von Neuman architecture and a
total novice on parallel systems but as I build the microWulf I wonder
if MPI will decouple the hyperthreading or is it not there?  From what
little I have learned so far the microWulf switch depends on the
relatively slow Gigabit Ethernet so there is probably time within each
dual core CPU for hyperthreading to occur if indeed provision is
provided for hyperthreading in the AMD X2 dual cores.  Sorry to ask such
a dumb question but I am trying to learn. 

	Don Shillady
	Emeritus PRofessor of Chemistry, VCU
	Ashland Va (working at home)

________________________________

		From: richard.walsh at comcast.net
		To: toon.knapen at gmail.com; beowulf at beowulf.org
		Subject: Re: [Beowulf] multi-threading vs. MPI
		Date: Fri, 7 Dec 2007 22:15:25 +0000
		CC: 

			-------------- Original message -------------- 
			From: "Toon Knapen" <toon.knapen at gmail.com> 

			How come there is almost unanimous agreement in
the beowulf-community while the rest is almost unanimous convinced of
the opposite ? Are we just tapping ourselves on the back or is MP not
sufficiently dissiminated or ... ? 

			Mmm ... I think the answer to this is that the
rest of world (non-HPC world) is in a time
			warp.  HPC went through its SMP-threads phase in
the early-mid 1990s with OpenMP, and then we needed more a more scalable
approach (MPI).  Now that multi-core and multi-socket has brought
parallelism to the rest of the Universe, SMP-based parallelism has had a
resurgence ... this has also naturally caused some in HPC to revisit the
question as nodes have fattened.  

			The allure of a programming model that is
intuitive, expressive, symbolically light-weight,
			and provides a way to manage the latency
variance across memory partitions is irresistable.

			I kind of like the CAF extension to Fortran and
the concept of co-arrays.  The co-array is
			and array of identical normal arrays, but one
per active image/process.  They are defined as such:

			          real, dimension (N) [*] ::  X, Y

			If the program is run on 8
cores/processors/images the * becomes 8.  8, 1D arrays of size
			N are created on each processor. In any
references to the locale component of the co-array
			(the image on the processor referencing it), you
can drop the []s ... all other references (remote)
			must include it.  This is symbolically light,
but reminds the programmer of every costly non-
			local reference with the presence of the []s in
the assignment or operation.  There is much
			more to it than that of course, but as the
performance gap between carefully constructed
			MPI applications and CAF compiled code shrinks I
can see the later gaining some traction
			for purely programming elegance related reasons.
If you accept that notion that most MPI
			programs are written at a B- level in terms of
efficiency then the idea of gap closing may not
			be so far fetched.  CAF is supposed to be
include in the Fortran 2008 standard.

			rbw

			-- 

			"Making predictions is hard, especially about
the future." 

			Niels Bohr 

			-- 

			Richard Walsh 
			Thrashing River Consulting-- 
			5605 Alameda St. 
			Shoreview, MN 55126 

			--Forwarded Message Attachment--
			From: toon.knapen at gmail.com
			To: beowulf at beowulf.org
			Subject: [Beowulf] multi-threading vs. MPI
			Date: Fri, 7 Dec 2007 20:07:32 +0000

			_______________________________________________
			Beowulf mailing list, Beowulf at beowulf.org
			To change your subscription (digest mode or
unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf