[Beowulf] Why I want a microsoft cluster...

Sun Nov 27 13:27:54 PST 2005

Mark Hahn wrote:
>>>> why?  because low-end clusters are mostly a mistake.
>> I disagree on this for a number of reasons.  It may make sense for some 
>> university and centralized computing facility folks for larger machines. 
> 
> but my point is mainly that most researcher's workloads are bursty;
> burstiness argues for shared resource pools.

Again, I am avoiding generalizing with "most researchers".  I have seen 
very mixed usage over my career.  Some with queues weeks long, some with 
effectively idle large machines.  If a researcher has a large group, the 
bursty nature (if there is one) tends to be averaged out.  However, I am 
seeing more of the workflow style analyses which tend to not show bursty 
usage (impulse like), but more bolus like (long sustained bursts if you 
prefer to cast everything in a bursty mode).

>>   However, one of the things I have seen relative to the large scale 
>> clusters out there is most are designed for particular workloads which, 
> 
> but that's silly - you observe that there exist some bad examples,
> and generalize to all shared clusters?

???  What I observe is silly ???  Most of the large clusters we are 
seeing out there on the top500 and going into customer sites seem to 
have a fairly uniform configuration, designed with a specific purpose in 
mind, which often sadly has little to do with the appropriate end user 
usage models.

We approach cluster design from a flexibility view point which is quite 
different than the folks who ship hundreds of racks of the same stuff 
per year.

[... deletion in order not to waste time arguing about stuff which is 
just silly, including the placing of words in my mouth  ...]

>>> towards the model of the "mainframe supercomputer" in it's special 
>>> machine room shrine tended by white garbed acolytes making sure the 
>>> precious bodily fluids continue circulating.
>> Yes!!!  Precisely.    But worse than this, it usually causes university 
> 
> hey, I'm all for decentralized computing _if_ it makes sense.  there's no 
> question network+desktops are better than mainframe+terminals.  but that 
> doesn't mean that all centralization is bad.

Your bias is towards centralization given what you do and where you 
work.  The politics have all been ironed out (ok, mostly).  This isn't 
true everywhere.

>> But worse than this, it usually causes university 
>> administrators to demand to know why department X is going off on their 
>> own to get their own machine versus using the annointed/blessed hardware 
>> in the data center.  As someone who had to help explain in the past why 
> 
> take a look at the fine print on your grant - the university is obliged to 
> ensure that grant money is spent effectively.  that clearly does mean that 
> every department/group should not go off and buy their own little toy.
> make that "underutilized, poorly configured, mal-administered toy".

Ugh.

<sarcasm>
Yeah, Mark, you are the arbiter of what is effective and correct.
</sarcasm>

Nor does it mean that a single large shared resource, mis-designed for 
many purposes is the most effective expenditure of money.

My point (not the ones you ascribed to me in the deleted material) was 
that there are optimaxes, and methods of designing the "central" 
resource such that researchers have a fighting chance of getting machine 
time on machines designed for their problems, and not designed to score 
well on HPL.  That is IMO the most effective use of the money, making 
sure you can make the most effective use of your computing resource.  So 
if your computing resource is designed for sparse linear algebra solvers 
with small RSS, and you need significant IO capacity per node, a high 
speed data network (not low latency, just good transport properties), 
are you SOL if your only resource is that nice central facility that 
Mark operates, which has that really low latency net, with a small 
memory size per node, and low IO bandwidth per node?  Largely yes you 
are SOL in that case, unless there is a section of that resource better 
designed for your needs.  You might call it a "piddling little cluster", 
but the groups who can use it effectively will consider it a great resource.

> 
>> Department X may be getting its own machine due to the central resource 
>> being inappropriate for the task, or the wait times being unacceptably 
>> long, or the inability to control the software load that you need on it.
> 
> then your central IT people are incompetent - fire them, since their purpose
> is to serve you.  it's just silly to claim that no centralized facility can 
> be responsive and appropriately configured.

That is absolutely not what was said Mark.   If the central resource was 
designed before the department X's needs came up (e.g. say a new faculty 
was hired, or a new effort was undertaken), then should we take your 
advice and label "central IT people are incompetent - fire them"?  No 
Mark, this is not nearly as black and white as you portray it.  It is 
not as simple as you claim.

>> folks manage the resource, and allow cycles to be used by others when 
>> the owners of the machine are not using so much of it.   Moreover they 
> 
> cycle scavenging is better than nothing, but only a way to mitigate a bad
> solution.  and it really only helps people who have least-common-denominator
> kinds of jobs (basically, small-model MC).

"Cycle scavenging" is not what is being talked about here.  More along 
the lines of a functional policy for resource control and access.  The 
owners of the machine get first shot at the cluster.   Additional users 
get secondary access.  This helps people who need cycles, but as you 
imply here, unless the machine is architected well for a particular 
purpose,  few jobs could make effective use of the resource.

>> Forcing all users to use the same resource usually winds up with 
>> uniformly unhappy users (apart from the machine designers who built it 
>> for a particular purpose).
> 
> this is overly pessimistic.

Heh....  No.  This is bitter experience.

>>> There is, I maintain, a real market for smallish clusters intended to be 
>>> operated by and under the control of a single person.  In one sense, I'm 
> 
> unless the person has constant demand, I disagree.  OK, so let's start with
> your personal-super.  now, is there some reason why its location matters?
> perhaps you're using it for really high-end parallel-rendering.  OK, great.
> you almost certainly aren't.  suppose you were happy with a single gigabit
> connection to the cluster - is there any reason not to locate it in a central
> machineroom?  chances are excellent that power and cooling will be more 
> effective than your lab/office.  now, suppose you had exclusive/dedicated
> access to the same-sized chunk of a scaled up version of the cluster.
> identical components, just your own little partition.  would you be happy?

This last bit was a point I was trying to make Mark.  It wouldn't be a 
scaled up version of the identical components, it would be a cluster of 
clusters.  I don't care (and most of our customers don't care) where it 
is located as long as it is fed and cared for properly, they can get at 
it, get usable work out of it.  If you scale up the nice low latency 
side of the cluster, are you going to be able to afford to scale up the 
high IO side?  Remember, you have a zero sum game goiing on with your 
dollar budget.  The high IO folks have needs very different than the low 
latency folks.

> one last question: do you mind of someone else uses your chunk of the 
> cluster when you aren't?  BINGO - you have just achieved what I'm talking 
> about: a shared resource pool that is co-located to make things like 
> better networking available, minimize infrastructure costs, etc.  there's
> absolutely no reason such a shared cluster can't incorporate all your choices
> of software, the hardware tuned to your application, etc.

No....  What I am talking about is a shared group of clusters, a cluster 
or grid of clusters (pick your terminology, I don't care), where you can 
do a fairly good job optimizing job fit on one of the base level 
clusters, and you can manage it in the large using a single site (or 
multisites).  Each lower level (or as you described it, "piddling 
little") cluster is different, adjusted to the purchasers needs.  The 
cluster or grid of clusters can be designed to set particular users jobs 
onto preferential subclusters or nodes specifically those being the most 
appropriate for the jobs.   Extra cycles may be allocated for "bursty" 
workloads.

What I see you talking about is scaling one version of the cluster up. 
So everyone has that nice fast quadrics interconnect.  And small memory 
footprint as the design point for the quadrics based cluster is for 
large MPI jobs that get better performance by wider distribution, and 
use little memory per node.    The folks with this workload are going to 
love this cluster.  The folks who need big fat local memory and a large 
local IO pipe are going to hate it.  It is going to be far slower than 
"that piddling little cluster" that they buy and "mismanage" themselves 
(to quote you).

> 
>>   Second, the large market is shrinking.  Again, this is not a slap 
>> against Jim and Mark.  The real HPC cluster market is moving down scale, 
>> and the larger ones are growing more slowly or shrinking.  This is going 
>> to apply some rather Darwinian forces to the market (has been for a 
>> while).
> 
> I believe this is mostly an artifact of the kind of lumping-together that 
> market-research companies do.

The data is very detailed.   This isn't a lumping effect as far as we 
can see.  Lots of others have looked at this and drawn the same 
conclusions we have.  This is the first time I have seen anything like 
your conclusions.

>  it's clear that the previous model for
> supercomputers (cray/vector/etc) is disappearing. 

No.  It is not disappearing, it is being relegated to a tiny niche. 
There are some things vector architectures can just do a better job at. 
  The list is decreasing over time, but there are some things that it 
cannot be beaten by clusters on.

> and it's clear that 
> commodity-cluster HPC is going to replace it.  since the research is lumping
> together a decreasing number of $1e7 machines with an increasing number of 
> machines built from 1e3 blocks, wow, the average is decreasing.  that doesn't
> actually tell us much.

You are making some incorrect assumptions about the data.  You are 
assuming that people are buying 1e7$ machines and reporting them as 1e3$ 
purchases.  The data (and how they analyzed it) tells a completely 
different story.  It tells the same story that we have been observing 
for the last decade.  Supercomputing is moving down market.  The folks 
whom have been denying this are either out of business commercially, or 
in perilous danger of being so in short order (including a former 
employer of mine).

> 
>> Its real.  Its just not growing in dollar volume.  Some of us wish it 
> 
> largely because components are getting cheaper, I believe.  again, the 
> lumped statistic is hiding multiple things happening: no doubt there is 
> some increase in the number of personal clusters (say, <= 32 cores),

Well, at this point, rather than work at arguing over your conceptions, 
I will suggest you actually go get access to the thing and read it.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615