[Beowulf] vmware perfomance
Jim Lux
james.p.lux at jpl.nasa.gov
Fri Feb 15 15:24:39 PST 2008
Quoting Mark Hahn <hahn at mcmaster.ca>, on Fri 15 Feb 2008 02:25:26 PM PST:
>
> I'm skeptical how much sense VM's in HPC make, though. yes, it would
> be nice to have a container for MPI jobs: checkpoints for free, ability
> to do
> migration. both these factors depend on the scale of your jobs: if all your
> jobs are 4k cpus and up, even a modest node failure rate is going to make
> agressive checkpointing necessary (versus jobs averaging 64p which are
> almost never taken down by a node failure.) similarly if your workload
> is
> all serial jobs, there's probably no need at all for migration (versus
> a workload with high variance in job size, length, priority, etc).
Perhaps the added overhead of using VMs to do "user transparent
checkpointing" is worth it in the same sense that most folks are
willing to tolerate the overhead of using a compiler and linker
instead of working in hex,octal, or binary machine code. Rather than
force a researcher to figure out how to do checkpointing, you buy a
few dozen more nodes to make up for the extra work.
You spend more on hardware and less on bodies, and since the hardware
is always getting cheaper (per quanta of "work") the trade gets more
attractive with time.
{Leaving aside interesting philosophical discussions having to do with
incremental cost of labor, especially ones own, vs capital and
operating costs of the iron. I've also noticed that even though we've
gone through many many Moore's Law doublings, with, probably a 5000
fold increase in computational horsepower on an engineer's desk every
20 years, design and analysis methodologies change much slower. In
the RF world, state of the art in design tools in 1960 was a paper
Smith chart and a slide rule, and a healthy dose of simplified
analytical approximations. State of the art in 1980 was simple
computer tools that essentially automated the pencil and paper
techniques, as well as some numerical analysis things (e.g. SPICE for
circuit simulation, which solves matrix equations and does numerical
integration, or early electromagnetics codes) State of the art in
2000 (and today, really) is integrated modeling tools with much larger
matrices and tighter integration between FEM codes and circuit theory
type analysis (that is, you might model the packaging with an EM code
but you'd use a behavioral model for the semiconductor device, rather
than using Maxwell's equations all the way down to the atomic level)
However, even with such nifty tools, a huge number of engineers still
use paper and pencil style analysis. Granted, they use Excel instead
of their trusty HP45 and a quad pad.. but the style of analysis and
design is the same. They even teach classes in "RF Design with Excel"
(which I view as anathema) Why isn't everyone using the new tools
(which hugely improve productivity and quality of the resulting design)?
Capital investment is required (gotta invest in the iron, and the seat
license)
Familiarity (if you learned to design 20 years ago, you're comfortable
with the methodology, you're aware of the limits, and you are
satisfied with the precision and accuracy of the results of that
methodology...)
The latter is another aspect of capital investment.. it takes time to
get used to a new way of doing things, time that the engineer may not
have, in an environment that stresses getting the product out the door
(or, in the case of where I work, getting to the launch pad in time
for the every two year launch opportunity for Mars).
So, against this background, giving up even 80% of the computational
horsepower, in exchange for allowing one to use a tool that might make
you 10 times more productive is a good trade. Sometimes, I think that
folks developing automatic parallelizers and similar tools are working
too hard to make it perfect. If I can take a chunk of software that
takes, say, 1 day (requiring periodic interactions, e.g., it's not a
batch overnight thing) to run now, and get it to run in 10 minutes,
that's a huge improvement. Put it in numbers. Say it costs me $3000
for a computer to run it in a day. If I can run it in 10 minutes
(e.g. about 50 times faster), and I do one run a day, I don't care if
it takes 100 processors to run 50x faster, as opposed to only 50. The
extra 50 processors costs me, say, $200K (extra overhead for
connectivity, facilities, etc.), which is a small fraction of the time
saved, because I've essentially replaced 50 engineers with 1. (putting
those 49 engineers out on the street, where they will inevitably cause
problems..idle hands, playgrounds, and so forth)
In fact, you could have some hideously inefficient scheme that takes
1000 processors to go 10 times faster, and it's probably still a good
deal.
Jim Lux
More information about the Beowulf
mailing list