[Beowulf] CSharifi Next generation of HOC

Fri Nov 30 21:34:32 PST 2007

C-Sharifi Cluster Engine: The Second Success Story on "Kernel-Level
Paradigm" for Distributed Computing Support

Contrary to two school of thoughts in providing system software support for
distributed computation that advocate either the development of a whole new
distributed operating system (like Mach), or the development of
library-based or patch-based middleware on top of existing operating systems
(like MPI, Kerrighed and Mosix), Dr. Mohsen Sharifi hypothesized another
school of thought as his thesis in 1986 that believes all distributed
systems software requirements and supports can be and must be built at the
Kernel Level of existing operating systems; requirements like Ease of
Programming, Simplicity, Efficiency, Accessibility, etc which may be coined
as Usability.  Although the latter belief was hard to realize, a sample
byproduct called DIPC was built purely based on this thesis and openly
announced to the Linux community worldwide in 1993.  This was admired for
being able to provide necessary supports for distributed communication at
the Kernel Level of Linux for the first time in the world, and for providing
Ease of Programming as a consequence of being realized at the Kernel Level.
However, it was criticized at the same time as being inefficient. This did
not force the school to trade Ease of Programming for Efficiency but instead
tried hard to achieve efficiency, alongside ease of programming and
simplicity, without defecting the school that advocates the provision of all
needs at the kernel level. The result of this effort is now manifested in
the C-Sharifi Cluster Engine.
 C-Sharifi is a cost effective distributed system software engine in support
of high performance computing by clusters of off-the-shelf computers. It is
wholly implemented in Kernel, and as a consequence of following this school,
it has Ease of Programming, Ease of Clustering, Simplicity, and it can be
configured to fit as best as possible to the efficiency requirements of
applications that need high performance.  It supports both distributed
shared memory and message passing styles, it is built in Linux, and its
cost/performance ratio in some scientific applications (like meteorology and
cryptanalysis) has shown to be far better than non-kernel-based solutions
and engines (like MPI, Kerrighed and Mosix). 

Best Regard
~Ehsan Mousavi
C-Sharifi  Development Team

-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On
Behalf Of Mark Hahn
Sent: Friday, November 30, 2007 6:20 PM
To: Beowulf Mailing List
Subject: Re: [Beowulf] Using Autoparallel compilers or Multi-Threaded
libraries with MPI

> Many problems decompose well in large chunks that are well done
> with MPI on different nodes, and tight loops that are best done
> locally with shared memory processors.

I think the argument against this approach is more based on practice
than principles.  hybrid parallelism certainly is possible, and in 
the most abstract sense makes sense.

however, it means a lot of extra work, and it's not clear how much benefit.
if you had an existing code or library which very efficiently used threads
to handle some big chunk of data, it might be quite simple to add some MPI
to handle big chunks in aggregate.  but I imagine that would most likley
happen if you're already data-parallel, which also means embarassingly
parallel.  for instance, apply this convolution to this giant image - 
it would work, but it's also not very interesting (ie, you ask "then what
happens to the image?  and how much time did we spend getting it distributed
and collecting the results?")

for more realistic cases, something like an AMR code, I suspect the code 
would wind up being structured as a thread dedicated to inter data movement,
interfacing with a thread task queue to deal with the irregularity of intra
computations.  that's a reasonably complicated piece of code I guess,
and before undertaking it, you need to ask whether a simpler model of 
just one-mpi-worker-per-processor would get you similar speed but with
less effort.  consider, as well, that if you go to a work-queue for handing
bundles of work to threads, you're already doing a kind of message passing.

if we've learned anything about programming, I think it's that simpler
mental models are to be desired.  not that MPI is ideal, of course!
just that it's simpler than MPI+OpenMP.

-mark hahn
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf