[Beowulf] Confused over the term cluster

Sun Mar 30 11:59:55 PDT 2008

On Sun, 30 Mar 2008, Kozin, I (Igor) wrote:

>
>
>> Just to be awkward, there are of course machines like the SGI Altix.
>> Is it a cluster, or is it a node?  From the programmer's perspective
>> it's the latter, from the architectural perspective, the former.
>>
>> There's no real dividing line; there are machines across the entire
>> spectrum.  But I'm just being difficult, generally I agree with what
>> the others have said.
>
> I'd say definitely the former - a cluster where the nodes (two sockets)
> are connected using Numalink.
>
> I have been thinking long and hard about those classification issues
> when I was designing sufficiently generic but consistent machine
> description for our benchmarking database. The view that I took is that
> there is a "node" which is a reasonably independent entity and you take
> those nodes and glue them together using some sort of "interconnect". So
> far this architectural view has not failed me but it is not impossible
> that future machines might have more intricate "node" or "interconnect"
> structure and therefore require more sophisticated approach.
>
> Programmer's view can't be used for classification purposes because
> there can be several views on the same hardware at the same time. The
> Altix example is admittedly blurred but even a traditional cluster can
> be programmed a la SMP using Cluster OpenMP or other similar approaches
> which obviously do not make the underlying hardware any different.

This is really the crucial point.  There are machines that are clusters,
nodes that are clusters, clusters of clusters, clusters of nodes -- all
of that is nomenclature that in some sense "doesn't matter".  One can
fairly easily specify the architecture of any given pile of PCs (one of
the most primitive views of a cluster:-) in direct descriptive detail.
Processor core(s), packaging/socketing, low level bus and/or net,
motherboard design, network and other peripherals.  Naming something "a
node" or "a processor" or "a core" unaccompanied by an in-context
definition of what the terms mean for THIS cluster is not necessary.

The real point is that whatever you call it, that pile of PCs is
designed/intended to do parallel work.  The tools for writing programs
for doing that parallel work impose various hardware related constraints
on those programs that have to be accomodated by a programmer for each
particular cluster (where how much it MATTERS depends of course on the
nature of the parallel work being done and how it "fits" the hardware.
A 16-core single motherboard machine may or may not be "a cluster", but
you're very probably going to program your parallel applications on it
using the same tools you'd use to program it on four four-core machines,
or 16 uniprocessor machines.  You might write the code very differently;
the balance of processor speed to IPCs and bottlenecks to critical
resources might be quite different, but we have learned the hard way
that programming to particular architectures at the deep hardware level
is a good way to go broke fast as it is obsoleted by Moore's law, while
code written generically for a good COTS cluster architecture can ride
from one cluster to another (with tweaks to serious rewrites, sure, but
using the same toolset and mental picture of what's going on) as
hardware and networks evolve.

So I personally agree with the consensus view here -- the "best" use of
nomenclature is that nodes are connected to nodes by COTS networks, and
anything in a single box with a proprietary internal network
interconnect on a single motherboard is a node, even though its
ARCHITECTURE on the motherboard may for all practical purposes itself be
"a cluster".  I've taken to calling what USED to be a processor "a core"
because processor seems to refer to a single package (as in "multicore
processor").  Just how the cores are packaged into processors, put onto
motherboards and connected to memory and perpherials, and motherboards
interconnected into clusters using networks -- all of that is just part
of the generic (now more complex) cluster description.

> As for the terms processing element (PE) and core, PE was clearly
> preferable for machine description rather than core because PE is more
> generic whereas the term "core" usually implies a fully functional
> processor that has been shrank. Terminologically PE may include
> processor cores, SPEs in Cell BE, GPGPUs attached to the node or
> accelerating co-processors.

Fair enough.  Viewing this as a "proposal" or "RFC" rather than a de
facto truth, it seems reasonable to differentiate at this level, but
since cores abound and PEs that are not also cores do not, don't expect
most people to not just talk about cores and processors.  I will try to
use PE (suitably defined) in future discussions that involve a wider
array of PE options than "just" cores.  But I'm guessing people will
still know what I'm talking about from context if I forget.

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977