UP2000 vs DS10 for beowulf?

Christoph Best c.best at fz-juelich.de
Sun Oct 29 16:27:46 PST 2000


Hi,

David Bremner writes:
 > I am looking at two different configurations of 600Mhz, 2M cache alpha
 > 21264w
 > 
 >    6*2 processor  UP2000, 512M RAM
 > 
 > or
 > 
 >   12*1 processor DS10L 256M RAM
 > 
 > both options include high speed networking (myrinet in one case,
 > wulfkit in the other; but possibly I could swap).  Both will run Linux.
 > 
 > The pricing works out approximately the same.  Is there anything 
 > obvious that favours one of these configurations over the other?
 > If the answer is "it depends", what does it depend on?
 > 
 > One thing that is a bit worrying is having two processors talking
 > through one PCI bus to the high speed network.

Our group uses just these two configurations, a 6x2-processor UP2000
cluster located at a lab in Juelich and a 128-processor DS10 cluster
at the university in Wuppertal. The web page is at
  http://nicse.hlrz.kfa-juelich.de:8888

 > The other is that I understand the UP2000 memory subsystem is not as
 > good as the ds20, and apparently can't provide the same bandwidth to
 > two processors. (see
 > e.g. http://www.dl.ac.uk/CFS/benchmarks/compchem.html, for anectdote,
 > but no numbers.) Does anyone have two processor stream numbers for 
 > UP2000s?

I do not have the current stream numbers at hand, but could get them
next week. DS10, DS20, UP2000, and ES40 all use the Tsunami chipset,
but with different number of chips. There is a good explanation on
Microway's web page
  http://www.microway.com/products/ws/alpha_21264.html 
Per processor, the UP2000 and the DS10 would be equivalent, and the
DS20 would have twice the bandwidth per processor. However, I do not
really understand how these peak burst bandwidths relate to actual
memory, in particular with respect to how many banks of memory are
used. Can somebody elucidate on that?

We have a very memory-intensive benchmark for the systems (inversion
of a large sparse matrix) and found that the UP2000 suffers a 10-20%
performance loss when both processors are active (as compared to a
system where one processor is idle). I cannot really compare UP2000
and DS10 directly, as our DS10s were only recently upgraded to 619
MHz. I have a comparison to the ES40 at 667 MHz, and the ES40 suffered
less penalty even when four processors were active.

Another difference is that the UP2000 is also available with the 750
MHz processor and 8 MB cache, while the DS10 is at 619 MHz and 2MB, at
least the ones we were able to get. And the 8xx MHz for the UP2000
version is expected in December. If you have really memory-intensive
application, the increased cache might help (or not, if your code is
vector-like).

As for the Myrinet, the 64-bit PCI bus is fast enough to feed at least
the standard 1.2 GBit/s Myrinet, so as long as you use only one
Myrinet card per system, the bus should be no worry. Of course, you
only get half the bandwidth per processor than on the DS10, but
Myrinet provides a lot of bandwidth (it gets within a factor of less
than 10 to the memory bandwidth) and the savings in Myrinet cost can
be an advantage.

We ended up choosing the UP2000 for the small departmental cluster and
the DS10 for a large-throughput University research cluster. For the
UP2000 spoke that we could get relatively cheap large memory (1-2 GB)
and that a single program could the use memory of two
processors. Also, SMPs tend to be more easily accepted by users. On
the other hand, a slight price advantage and the reduced risk of
memory congestion spoke for the DS10 on the large cluster, where SMPs
would not really be helpful as users are expected to run large
parallelized codes.

Hope that helps...

-Chris

-- 
Christoph Best                                        c.best at computer.org
John von Neumann Institute for Computing/DESY     http://tigertiger.de/cb




More information about the Beowulf mailing list