[Beowulf] large MPI adopters

Wed Oct 8 11:00:19 PDT 2008

[ Trimmed CC list ]

On Wed, 8 Oct 2008, Vincent Diepeveen wrote:

> When you add up others.

It's interesting that you make this statement along with another one 
saying that many companies that have clusters or other parallel 
computing resources don't make them public. So how can you arrive to 
such a figure ???

> Note i see in IWR you got a 160 node opteron cluster, what software 
> eats system time on that cluster?

In my vocabulary, system time is the time used by the system (kernel, 
maybe daemons) as opposed to user time.

What probably interests you is what kind of software is running on 
HELICS; if so, how come you managed to see the hardware page but 
missed the projects list ? It's linked from the main HELICS page, 
but to make it easier for you this is the direct link:

http://helics.iwr.uni-heidelberg.de/users/project_overview.php

The 160 nodes are only the newer part called HELICS II; and there are 
also other clusters in IWR :-)

> There must be graphs about it, want to share one?

Measuring what ? Generally speaking, I don't produce graphs, text 
output is enough for me to know that the cluster is busy :-)

> Also interesting to know is how many cores most jobs eat!

Most jobs running on HELICS II start 4 processes per node. There are 4 
cores per node, so that's 1 process per core. Some of the software 
(mostly developed in-house) requires a lot of memory per process, so 
then less cores per node are being used - it's a tradeoff that was 
factored in the design.

> that's the amount of system time airindustry eats of supercomputers 
> designing new stuff. It is very significant together with car 
> manufacturers.

And how do you know that ? :-)

> With 160 nodes machine, even if that would be power6 nodes, you 
> can't even design a wing for aircraft.

Aircraft design as a research subject can probably easily earn 
computer time on the many parallel computing resources in Germany that 
any researcher working in a German university can apply for. Some of 
the parallel computing resources are better suited for certain tasks 
(like the NEC SX8 in Stuttgart for vector software), so we're free to 
choose the one that fits best. Local resources are in some cases 
preferred because they can be easier controlled (f.e. hardware choices 
to make it fit for a particular application that is run very often) or 
getting access to. So the size of the local cluster(s) is not a 
limiting factor for the science that gets done.

Getting non-x86(_64) compatible CPUs (like Power-something) for HELICS 
was considered but rejected because of the already-mentioned problem 
of binary-only software from ISVs.

> Well, i sure avoid hospital visits, you?

I do go to the hospital to visit ill relatives or friends :-)

> As you know the healthcare commission is basically old men

You seem to have a lot of knowledge about everything, I don't - so I 
can't comment on your statements. ;-)

> Computing power or not, most manufacturers simply want to sell a 
> product, so the amount of systemtime invested on a computer, in 
> order to sell, is total irrelevant to them to get a drug accepted 
> usually.

Indeed, the computer is involved usually only in the initial stages, 
when many compounds are screened and only very few go further. 
Clinical trials are very expensive and time consuming, so the computer 
time used initially is only a fraction of the total time needed to put 
out a new drug. But this is not because the pharma companies would 
want to keep it so long, it's because all these further tests are 
supposed to keep the later users of these drugs out of trouble and to 
provide that (sometimes very long...) list of side-effects they are 
required to put in the box. It's not the companies who find the 
computer time irrelevant, it's the whole system that makes it so.

> Maybe a new breed of company can do the job. A breed where making 
> profit is not most important :)

"Imagine there's no money" (paraphrasing John Lennon) :-)

> Some hardware guys can better comment here. Making your own FPGA 
> chippie is rather cheap. Finding and paying a few good guys to get 
> the chip done is toughest. You basically need 1 good programmer, the 
> rest can get hired parttime in fact.

Maybe you should read more about the Anton chip and the team that 
created it. That would better fit into the 'lot of knowledge about 
everything' figure that I mentioned above :-)

> So $150k for computing power that total annihilates any 
> supercomputer of now and the NEAR future for your specific 
> application.

Whew, do you wanna become partners so that we can sell this idea to 
NASA ? :-)))

> I tend to remember that good cpu designers, not to mention memory 
> controller guys, are rather expensive folks

Indeed. Then look at the Anton chip and see why your 1+peanuts 
expert/project idea doesn't fit ;-)

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de