n/w ram or DSM ??

Mon Nov 25 17:12:40 PST 2002

On Fri, Nov 22, 2002 at 12:53:30PM -0500, Mark Hahn wrote:
> > IS DSM is more complex or its not a good idea ?
> 
> I'm always puzzled by interest in DSM.  why?  basically just because
> pages are big, pagefaults are expensive, and networks are slow.

Mark, as usual you say something that's correct.

Being an arrogant nit-picking bastard, I do have a nuance to add though  ;)

...
> I think most people believe that explicit message passing (MPI, etc) is a
> better solution to cluster programming.  specialized hardware could make DSM
> more interesting, but that starts to be more than a small matter of
> programming ;)

RAM today is very slow, compared to the CPU core speed.  Yet, with a
reasonably simple programming model (any recent RISC or CISC instruction
set), it is still possible to get "decent" or even "good" performance
even when using RAM.

How so?  Well, we introduce restrictions on the programming model (block
your acces to avoid cache thrashing, don't jump if you can avoid it,
etc.), and layers of cache, pre-fetching, pipelining and out-of-order
execution are introduced.

If one went on to implement DSM, this would be a natural extension of
the already hierarchical memory architecture of a modern computer
system.

As you say, if it is done naiively (without caching, pipelining and
pre-fetching), it will perform as poorly as RAM would do were the same
features removed from the CPU->RAM access logic.  Or, if the register
allocator (well, the optimizer part) was removed from the compiler.

However, there is nothing stopping people from implementing caching (can
be done rather isolated in the DSM logic), pre-fetching (could be done
by "hinting" as an explicit API for the user to call, or, inserted at
compile time by a parallelizing compiler, or, inserted at run-time by a
virtual machine), and pipe-lining/OOE ("software pipelining" in a VM).

I believe DSM can be done well - MPI is just a simple DSM API, seen from
a sufficiently contorted point of view  :)

But I'm pretty sure of several things:
*)  If the DSM is done as a library, it will require *significant* work
    from the user to make it run well - just like MPI does.
*)  DSM can be done efficiently, with few requirements from the user,
    just as it is in some way done in modern computer systems today (in
    hardware).  But this requires some *significant* work that goes way
    beyond simply implementing DSM. It requires either compiler support
    (a special compiler that *understands* DSM and the DSM environment),
    or runtime support (a VM eventually + JIT, which understands and
    eventually adapts to the changing DSM environment). 

There is plenty of logic (both in hardware and software) working today
to make the slow speed of RAM bearable - it should be no surprise, that
yet another significant amount of logic will be required to add the next
layer in the hierarchical memory model of a (now distributed) computing
system.

In other words:  implementing DSM is almost trivial (look at the number
of available MPI implementations!).  Creating a DSM system that runs
well *and* imposes few restrictions/requirements on the user, is very
far from trivial.

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............: