Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Kidger's comments on Quadric's design and performance

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Joachim Worringen joachim at lfbs.RWTH-Aachen.DE
Wed Apr 24 09:16:01 PDT 2002


James Cownie wrote:
> 
> Sorry if you get something like this message twice, I submitted it
> once and nothing has come back, although my correction to one of the
> www addresses went through :-(
> 
> Joachim Worringen <joachim at lfbs.RWTH-Aachen.DE> wrote
> 
> > > This message also reminded me to ask if a long-held opinion is valid - and
> > > that opinion is "that a cache coherent interconnect would offer performance
> > > enhancement when applications are at the 'more tightly coupled' end of the
> > > spectrum."  I know that present PCI based interfaces can't do that without
> > > invoking software overhead and latencies.  Anyone have data - or an argument
> > > for invalidating this opinion?
> >
> > You would need another programming model than MPI for that (see below),
> > maybe OpenMP as you basically have the characteristics of a SMP system
> > with cc-NUMA architecture.
> 
> No, you are confusing two completely different issues. To support
> OpenMP you need a single address space which spans the processors.

You are right, this is completely different. However, I did not mean
that connecting nodes of a cluster with a cache-coherent interface
"gives you an SMP", but more precisely "gives the shared parts of the
distributed distinct address spaces nearly SMP-like access
characteristics", with respect to a suitable programming model. 

This would enable a matching OpenMP-Compiler/run-time-lib to generate
and run code with (more or less) SMP-like performance as does the OMNI
OpenMP-Compiler (currently on top of a software DSM library SCASH on top
of SCore, see http://www.hpcc.jp/Omni - this is all software which is
much more perfomance-sensitive to bad data-placement and has generally a
much higher overhead than such a hw-based solution would have).

There is something similar on top of SCI, namely the HAMSTER project
(http://hamster.informatik.tu-muenchen.de/), but w/o OpenMP, IIRC, and
still some software-overhead to "simulate" cachable remote memory on top
of SCI-connected PCs.

With Quadrics, this should be possible in an even more efficient manner
due to the hardware-MMU and -TLB on the adapter.

To have a real cc-NUMA-SMP, the integration needs to be higher (HP
X-Class, DG/IBM NUMA-Q, ...), this is for sure.  The question is: are
large-scale SMPs as sold by IBM, Sun, ... not the better solution for
such tasks? Quadrics is expensive, and you still have to manage a bunch
of PCs instead a nice, single SMP.

  Joachim

-- 
|  _  RWTH|  Joachim Worringen
|_|_`_    |  Lehrstuhl fuer Betriebssysteme, RWTH Aachen
  | |_)(_`|  http://www.lfbs.rwth-aachen.de/~joachim
    |_)._)|  fon: ++49-241-80.27609 fax: ++49-241-80.22339



More information about the Beowulf mailing list