Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]

Fri Jun 23 06:50:03 PDT 2000

Hi,

just my two cents (of an Euro) to the "Beowulf vs. Supercomputer"
discussion:

I found that the comparison is more often "building/buying your own
group or departmental cluster" vs. "writing applications for
supercomputer time on a nationwide computer center". Even our little
12-processor cluster provides 100000 processor hrs a year, about what
you would get for a smaller project in a supercomputing center, and
the 128-processor ALiCE cluster of Wuppertal University may be a
factor 5-10 smaller than a big Cray, but there are usually much more
than 10 research institutions sharing a supercomputing center. [BTW,
the Wuppertal cluster was chosen over established mid-range
supercomputers in a competition based on price/performance for
selected application benchmarks.]  Add to this the organizational
overhead and inconvenience of a supercomputing center.

So unless you really need O(1024) processors, many projects should be
better off on a cluster. And if you really need that amount of
computer time for a prolonged period, you probably would not be able
to pay for the supercomputer. Some subfields, like ours (Lattice Field
Theory) or astrophysics, have since quite some time resorted to
building their own supercomputers, sometimes combining the Beowulf
idea of off-the-shelf components with custom interconnects. The
closest may be QCDSP from Columbia University, which is built from
Texas Instrument Digital Signal Processors on custom printed-circuit
boards, and delivers in its largest installation about 400 GFlops
(they are aiming at 10 TFlops for their next project). Others are
QCD-PACS in Japan (based on a modified HP chip), and APE in
Italy/Germany (custom designed processors for a single-instruction
multiple data machine).

Also, when writing an application that needs O(100) GFlops-years, many
physics groups are happy to tailor their programs to the machine and
write message passing codes (as long as graduate students come cheap),
so SMP is not really missed. Cray's top-of-the-line T3E actually is
message-passing, so many programs are written for it.

Finally, we found that processor speed is increasing so quickly, that
even our once considered network-hungry application does not exhaust
Myrinet. Myrinet gives you maybe 100 MB/s data transfer, but the
memory transfer rate may also be only 300-500 MB/s/proc. - a 10 GB/s
network would run much faster than a current memory bus.

Actually, the best argument, if any, against Beowulves that I found
was sheer size and power consumption, mainly because the average node
contains much more circuitry than needed. If someone came up with a
small board containing an Alpha processor, cache and main memory, and
a Myrinet or similar connection... But this, of course, would not be
much different from a T3E.

-Chris
-- 
Christoph Best                                        c.best at computer.org
John von Neumann Institute for Computing/DESY   http://www.oche.de/~cbest