[Beowulf] BMW Shifts Supercomputing To Iceland To Save Emissions

Mark Hahn hahn at mcmaster.ca
Mon Oct 15 11:07:47 PDT 2012

> Mind you, I'm a huge fan of small clusters under a single person's control,
>where nobody is watching to see if you are making 'effective utilization'
>and you can do whatever you want.  A personal supercomputer, as it were.
>But I recognize that for much of the HPC world, clusters are managed in the
>same way as big iron mainframes were in the 70s,

I think you're being a bit disingenuous here.  dedicated/personal 
clusters are perfectly sensible when the workload is non-bursty
or somehow otherwise high-duty-cycle.  or perhaps when you're 
talking about resources cheap enough to hand out like pencils.
(that is, let's be honest: cheap enough to waste.)

a larger, shared resource pool is ideal for bursty/low-DS environments.

as far as I can see, there are really only a couple problems with this:

- many people and most environments have a mixture of burstiness.

- schedulers are not awesome at managing latency of either flavor
   when both are mixed, especially in the presence of poor resource
   requirements (bad runtime estimates, poor memory requirements, etc.)

- resource granularity becomes even more of a problem: serial jobs
   "contaminate" nodes for parallel use or high vs low mem, etc.

- very short runtime limits permit more rebalancing of resources,
   but are incredibly harmful to most people's productivity.

- preemption (SIG_STOP/CONT) seems to be a relatively little-used
   way to optimize for latency - enough so that it simply does not work
   right on major non-free schedulers.

- it's hard to get people to treat storage as ephemeral :(

- big resources are also big budget targets :(

More information about the Beowulf mailing list