[Beowulf] Differenz between a Grid and a Cluster???

Mark Hahn hahn at physics.mcmaster.ca
Tue Sep 20 06:13:41 PDT 2005

> My definition: a cluster is an homogeneus set of nodes, while a grid is an
> heterogeneus set of clusters.

well, heterogenous in some sense, at least.  *most* of our clusters are 
heterogenous (different ncpus, memory, disk, interconnect, clocks).
but within a cluster, you probably only have to compile your program once.

> Grids are very young and, to what I see in the HPC world, most of the
> time, it is only a trendy word for saying flat big cluster.

trendy word for saying "I don't care about hardware specifics".

> > What are the main points for Grid and Cluster?
> Cluster=easy (up to 200 nodes, tricky at 500), Grids mean intricated
> problems and increased complexity.

I think that grids are actually pretty simple *because* they are 
by definition not interested in efficiency, only size.  yes, it may 
be problematic or complex to wrestle with globus/etc, but they are 
merely implementations.  the basic idea is "portable language, undemanding
codes, infinitesimally small IO/IPC, some sort of authentication (probably
PK) infrastructure, and a LOT of trust, both ways."

I do often wonder about the motivation for grids.  obviously, if there are 
MANY clusters out there which are mostly underutilized, then grids would 
be an effective way of keeping the machines warm.  but you have to wonder
about why these underutilized clusters exist in the first place.  is the 
grid purely about cycle-vulturing for seti at home-type applications?  actually,
I think that's the most effective kind of grid-app: embarassingly parallel,
low-IO, and perhaps most importantly, search-based.  (trusting the integrity
of machines in a grid is an interesting topic - stickier-seeming than basic
authentication-type plumbing.)

so, you've got an idle cluster, and don't mind if someone uses it rather 
than letting it sit.  presumably someone else picks up your power tab.
and presumably your internet connection is not metered or paid for by 
someone else.  and presumably the owners of the cluster will want prompt and
complete utilization of it when they get back from vacation.  actually, that
brings up another issue with grids - since they're opportunistic, there must
be some tradeoff when using a cluster is of no advantage.  for instance, if 
the grid app takes 5 seconds to authenticate, authorize, transmit, compile
and set to running, then you're happy getting even, say, 30 seconds of time.
but what if your grid job gets suspended 5 seconds in, and stays suspended
while the owner runs a 3-week, all-cluster job?  I guess it not only depends 
on being able to do seti at home result-verification, but doing it cheaply
enough that you can jump at any opportunity.  that means your grid-scheduler
needs to be pretty agressive, as well.  and perhaps it also needs to be 
peer-to-peer, since otherwise, any large grid will have a serious hotspot
at the controller.

grid is an interesting model, but taken to its logical conclusion,
I think it's extremely different from what people think of as clusters.

regards, mark hahn.

More information about the Beowulf mailing list