IBM goes grid

Thu Aug 2 10:27:33 PDT 2001

On Thu, 2 Aug 2001 john.hearns at framestore.co.uk wrote:

> Or the use of remote archives of 3D medical datasets via fast links.
> Blue skying, a physician would be able to call up a series of known
> diagnosed cases of a certain pathology on his workstation,
> and have the data transferred (with suitable anonymity) to it.
> Or could submit computationally demanding tasks, like cancer treatment
> planning, to a fast central compute resource.

Sure, one can imagine lots of users.  But.

> > The first one that comes to mind is overwhelmingly security.
> Absolutely.
> How do you authenticate someone from a partner institution who
> wants to use CPU on your farm, or access data in your silo?
> Efforts are currently underway in areas like this.

The problem is that I doubt that it can be done globally.  Ever.  I
think that you are on the wrong side of a critical regime in complexity.
The only two solvable limit points that I can see are:

  a) Universal "Trust".  Presuming that one can build a truly secure
sandbox on a per-system basis (which I believe to be possible but not
necessarily easy or robustly secure as kernels and libraries evolve), I
can see a kernel which always allows its system's surplus resources to
be used by basically anybody on the Grid perhaps weighted by a project
choice matrix ("I want my computer to be used to help beat cancer") or
("No distributed games, please").

It will be spoofed, abused, misused, and there will be a constant
policing problem that will probably need to be mostly ignored.  We all
agree that our front yards are part of the village green and just grit
our teeth and bear it when the cows come in to graze or folks decide
to make a temporary mudbath there for the pigs.  Obviously laws would
have to be passed protecting participants from liability or criminal
culpability when somebody uses their front lawn to launch surface to air
missiles at passing jets or to hawk stolen goods or to put on an open
air striptease.

  b) Universal "Mistrust".  STILL presuming that one can build a truly
secure sandbox, have only a very small handful of projects that are
"approved" to run within, hire vigilantes to guard the projects,
legalize some medieval tortures applied to those that compromise their
sanctity.  This is close to what one has now with SETI and RC5 but
without the medieval tortures -- participants can only hope and pray
that the RC5 and SETI vigilantes are successful in keeping out vermin
who might (for example) slip a few hundred line loader into the code
that on (pick a date) goes to (pick a site), grabs a carefully crafted
softbomb, and loads/runs it.  Even the vigilantes might miss such a
small thing in a really big application, and by loading the real bomb in
real time all sorts of virus software can be circumvented.

By mistrust here I obviously mean that the public only trusts particular
projects to be runnable universally, and those projects are subjected to
an extraordinary (and expensive!) degree of scrutiny to keep them
secure.  Even so, this sort of operation will only last until the first
time SETI or RC5 areused as a cracking/viral insertion mechanism at
which point trust for this sort of thing will evaporate and their
success at harvesting cycles will quickly be forgotten.  Perhaps two
projects can be kept secure enough that this is "unlikely" to happen.
Perhaps five can.  Fifty cannot.

In between, the only solution that I see requires restricting the grid
to a domain where authentication and job validation are possible and
where users (and their jobs) are hence knowable and can be held
accountable.  Universities sure.  Businesses sure.  Public research
centers maybe.  Collectives of Universities and/or businesses or
centers, possibly, but the probability of mishaps grows inexorably as
you expand the pool of participants and dilute the direct administrative
control you have over any one of them across "natural" adminstrative
boundaries between organizations.

I see nothing surprising in turning a University's compute resources
into a giant compute cluster -- I did it myself with Duke's public
cluster using rsh, a shared AFS filespace, and a standard account seven
or eight years ago and harvested a few GFLOP-years of cycles back when
this was a LOT of computing.  Nowadays there are better tools for every
step of this -- ssh, PBS, Condor, and more -- within an administrative
domain.  I don't see the entire Internet (or any significant
cross-section of it that crosses administrative authentication domains)
becoming a giant compute cluster anytime soon, if ever, except in one of
the two limits above.  It would require something truly fasicist to
create a personal authentication namespace that spans the globe...
something like get your driver's license, universal network
account/license, and hardware authentication device all at the same
time...(maybe even on the same card).  Shudder.

> I agree. I wonder how useful the 'traditional' model of
> a departmental Beowulf, with the compute nodes 'hidden' on a private
> network
> behind a dual-headed master compute node will be to
> these farms which need lots and lots of access to data stores, and data
> stores that are potentially across WAN links at that.

This is the interesting question, of course.  A grid concept turns a
beowulf "inside out" in some sense.  A grid more like a natural
extension of a NOW to an CONOW (collection of NOW) and one would
expect/prefer that the systems in some sense all be "publically
accessible" rather than compute nodes simulating an SMP computer.  Of
course a grid is only useful for extremely coarse grained to
embarrassingly parallel tasks (in the outer regions of the Amdahlian
speedup scaling equations where communications is all but negligible
compared to computation) while a beowulf is designed for coarse to
moderately fine grained work.

Still, one would like to be able to recover the wasted cycles in true
beowulfs as well as in more general NOW-style clusters and compute farms
(where the nodes support a shell and shared filesystem, if we can now
begin to use this to distinguish at least Scyld-like beowulfs from
sloppier clusters).  At any rate, I'll look over the resources you so
generously include below (and others sent to me offline) and see what I
can put together for Duke, as I have neither time nor inclination to
reinvent wheels if I can avoid it.  Thanks loads.

   rgb

>
>
> You can do worse than start with the pages of the European datagrid:
> http://www.eu-datagrid.org
>
> And the Globus project http://www.globus.org
>
> For those who haven't met it,
> "The Grid: Blueprint for a new Computing Infrastructure" is IMHO
> the canonical work in the field.
> http://www.amazon.com/exec/obidos/ASIN/1558604758/qid=996769001/sr=2-1/ref=aps_sr_b_1_1/102-0420104-7691336
>
>
> > Seems like a useful project for some grad students in CS departments
> > that concentrate on cluster computing, e.g. Clemson.  Walt?  Anybody?
>
> Being cheeky, if anyone knows of interesting jobs in this area let me
> know.
>
>
> John Hearns
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu