500 CPU Beowulf @ ETHZ

Eugene Leitl eugene.leitl at lrz.uni-muenchen.de
Thu Jul 6 12:57:12 PDT 2000


http://www.hoise.com/primeur/00/articles/corner/AE-PL-06-00-5.html

To start a supercomputer company it only takes two brothers - Swiss
Dalco delivers 500 processor system to ETH Zuerich

Mannheim 10 jun 2000 The ETH Zuerich has installed a large Linux
cluster that will be expanded later this month to 500 processors, 250
Gbyte of memory and 2 Tbyte of data.  Because the machine is worth
more that 1.5 million Swiss Francs, European regulations require an
open tender that has to adhere to strict rules. So who won this
tender? One of the big supercomputer companies you would assume.  Not
so, a small Swiss company Dalco employing 8 people but with a yearly
turnover in the 10 million Franc range, solved the legal issues, the
technical problems, convinced the ETH they could do the job, and
offered the lowest price.  Hence the cluster, called Asgard, was
installed by the company of Christian and Francois Dallman. The
cluster runs Linux, provided by SuSe which has a number of additional
tools for running a cluster and supporting parallel programmes.

Matthias Troyer, from the ETH Zuerich, explained the whole acquisition
process at the Supercomputer 2000 Conference in Mannheim. At the
Physics Department in Zuerich, they are used to using a lot of
computing power.  Troyer also stayed for a longer period in Japan,
where he had access to one of the largest supercomputers in the world,
a Hitachi machine. Being used to have supercomputer access, he
learned, when coming back to Zuerich, that the ETH planned to shut
down the old Paragon parallel supercomputer they where still operating
and had no plans for replacement. However, the ETH computer centre
would support users when buying their own machines, as much as
possible.

Apart from Troyer, also other physicists had a need for computing
power. Hence the idea was born to look for a BeoWulf cluster. A Linux
system with Ethernet interconnects would be able to fulfil their
computational needs and because they where only a small group - and
physicists - it would not be much of a problem to operate.
Calculations showed, however, a big cluster would need special
cooling. No problem at ETH where a conditioned computer room is
available at the computing centre.

With Linux clusters, you have a choice to base it on the Alpha or on
the Intel chip. ETH did some benchmarks and noted that in general
Alpha's are faster but Intel offers better price performance. Hence if
network speed is not important and you can do with Ethernet, Intels
will do.  Their applications, Monte Carlo simulations, series
expansions and education, would work well with an Intel solution. In
the benchmarks they also note that the Linpack performance benchmark,
used in the TOP500, is irrelevant to their applications.

Because of the budget available and because ETH is a publicly funded
organisation, they learned very soon, the procurement had to follow
strict European regulations.  Hence some legal advice was needed,
which they got. A first step in the process, after writing a request
for proposals, was publishing the tender officially in the Swiss
"Handelsamtsblatt". This newspaper is not one that young
twenty-year-old fast IT guys like the Dallman brothers normally
read. Hence they would have missed it if someone had not drawn their
attention to it. They got the paper work and saw they could build a
cluster as required by the ETH.  Two weeks later they delivered their
offer which they had written with help of their German software
partner SuSe.

A number of the other companies, small ones from other countries, did
not qualify because they did not have support within the Swiss
borders. Hence the only competitors of DalCo were the big computer
companies. After several presentations and refining of the offer,
DalCo was awarded the contract on October 14, 1999. On November 8,
1999, the contract was signed, and six weeks later the computer was
already delivered to ETH. Before the end of the year the machine was
up and running. In March, it was accepted.  This machine consisted of
192 compute nodes with dual 500 MHz Pentium II chips and 1 Gigabyte of
memory. In fact, the largest part of the cost of the machine is the
memory. Unfortunately for ETH, the prices went considerably up during
the whole acquisition phase. But overall, Troyer said, they did get a
larger machine for their money than they anticipated. The best
performance of this machine is 29.6 Gflop/s on one of their codes. The
whole system, including the SuSe cluster software, runs smoothly, he
said. Only bottleneck is the I/O file server.

Meanwhile, DalCo received the order to upgrade the machine to 500
processors by the end of June 2000.  Probably further expansion will
be needed as other departments also want to add their part to the
Asgard cluster. So as the 'Dalcon brothers' show, it still only takes
a few bright people to produce a supercomputer class system.




More information about the Beowulf mailing list