Who runs large capability jobs

Thu Jul 27 20:27:32 PDT 2000

Greg Lindahl wrote:
> 
> There are people who run large, and there are many more people who want to
> run large. An example is all the systems at the top of the Top500 list.

One thing of interest to me on that list is that one can
take 124 Power3II's and rank 95th on the list. So you're
talking about the performance of the 94 machines above that 
and I'm talking about the performance of the 406 systems at 
or below that.

So, that list is dominated by the class of machines I've been 
attempting to describe, up to 128 state of the art processors. 
Use older processors and the node counts can get much larger 
than with state of the art processors.

And, of course, wait until later this year and everything will
change again with vendors releasing new systems.

> The people who want to do it today are all the folks working on grand
> challenges. The NSF, for example, moved from a model where lots of
> researchers got little slices of their supercomputer time to a model where
> the little guys stay home and big projects get big pieces of the machine.
> The NSF Terascale procurement, results to be announced soon, is a 5 TF
> (peak) machine, and it's going to be used that way, too. So the general US
> researcher is going to have access to that kind of resource. 

Wait a minute. This is contradictory. First you say the little
guys go home and then the general researcher gets access to
the big systems. Sounds to me like only certain groups will get
access and thus many researchers will be left to fend for themselves
to get processing capacity, probably building their own small
Beowulfs. Which is it?

> Commercial folks are headed in that direction. You can decide for yourself
> if you think that embarrassingly parallel sites like the 1,000 cpu genetic
> algorithm site is "a single job", or if Google's cluster is running "a
> single job". 

Probably not single parallel jobs, at least not in the sense
of one MPI/PVM task firing up 1000 nodes and doing some problem.
Depends on one's definition of a "job". 

> But I assure you that the protein folding guys are headed to
> 1,000+ cpu runs. And they have hundreds of millions of dollars to spend. On
> computers.

Question I have is how many companies do this vs other types
of cluster processing?

> 
> > The large systems are an exception,
> > not the rule.
> 
> Right. Whatever. I don't care what the rule is. I do what I do, you do what
> you do, we compare notes on this mailing list so we both can learn. 

True.

> Arguing
> or discussing what's typical is about as interesting as the periodic
> comp.arch flamewar "Who cares about anything but x86, only x86 is
> important"... "who cares about x86, microcontrollers ship in 10 times the
> volume"... "well, I want to talk about Alpha, because it's best for my
> app"...
> 

You may not be interested in the "norm" since you're out on
the edge of the bell curve selling large systems. So be it.

The origins of this digression were your comments on SGI
not being scalable systems. It may well make sense for some
situations but it is unlikely to be true for many others.

Also, the highest ranking SGI Origin is number 3 on the top500
list so for some problems, SGI's obviously scale. The highest
ranking Alpha is a Cray T3E at 7. Newer Alphas make it to
number 23.

One thing I learned when I worked for IBM is to never badmouth 
another vendor's equipment. Let the numbers/vendor speak for 
themselves.

Do as you see fit.

How about back to Beowulfs now.

Wes