[Beowulf] CSharifi Next generation of HPC
Donald Becker
becker at scyld.com
Tue Dec 4 14:00:52 PST 2007
[[[ Hmmmm, OK, I seem to have moderation-approved pretty much a repeat of
a wide-spread posting. So I'll answer with the response I was planning a
few days ago. ]]
On Tue, 4 Dec 2007, Ehsan Mousavi wrote:
> C-Sharifi Cluster Engine: The Second Success Story on "Kernel-Level
> Paradigm" for Distributed Computing Support
>
> Contrary to two school of thoughts in providing system software support for
> (like MPI, Kerrighed and Mosix), Dr. Mohsen Sharifi hypothesized another
> school of thought as his thesis in 1986 that believes all distributed
> systems software requirements and supports can be and must be built at the
> Kernel Level of existing operating systems;
In 1986 I had been working for a few years on shared memory systems with
a hefty proportion of custom-designed hardware.
I learned from that experience. That's why I now work on distributed
memory systems based on off-the-shelf commodity hardware.
I also think that there are some important aspects of cluster
infrastructure that (at present) can only be implemented by tweaking the
kernel. But most of the features to make a cluster easy to use don't need
special kernel support, and indeed can't be implemented inside the kernel
at all.
You might initially think "you can put any program inside the kernel,
therefore you can do everything inside the kernel". But as a
counter-example consider name services. Essentially all programs use the
standard library interface to name services, which in turn uses the Name
Service Switch. You can add a bunch of really powerful feature by
using a cluster-specific name service. And this can only be done by
working with the existing user-level library code. (Well, unless you
build a new library within your kernel.)
This argument almost misses the main point:
Cluster systems exist for to simplify the system for the end users.
When you think in terms of kernel modifications, most of the changes end
up being tricks to prove to other developers how clever you are, not
features that make the system easier to use (example: Plan 9). And most of
the clever tricks end up getting in the way of the developer, rather than
speeding up the application or really simplifying the programming model.
DSM / Distributed Shared Memory (which I prefer to call NVM, Network
Virtual Memory) is a prefect example of this. It certainly doesn't help
the end user. The only aspect an end user or system administrator sees is
that NVM causes cascading system failures when one machine drops out of
the cluster.
The programmer doesn't benefit either. They initially
think that NVM gives them an easy to use shared memory model. They
quickly find that it only appears to be normal memory. To get even barely
acceptable performance they have to treat the shared memory very
differently than regular memory. Variables written by different processes
have to be segregated into different pages. Writes have to grouped. You
have to think about when to manually cache structures to avoid a re-read
that might trigger a network page fault, but refresh that structure when
you need potentially updated values. Many independent attempts have
concluded that most application ports take a long time to tune for NVM,
and almost all end up using NVM as a stylized message passing mechanism.
--
Donald Becker becker at scyld.com
Penguin Computing / Scyld Software
www.penguincomputing.com www.scyld.com
Annapolis MD and San Francisco CA
More information about the Beowulf
mailing list