[Beowulf] Checkpointing using flash

Sat Sep 22 18:12:14 PDT 2012

Jim Lux wrote that one giant [distributed] memory has scalability
problems from physical distance reasons.  Yes indeed.  Simply to clarify,
I was refering to a specific niche in parameter space (physical and
programmatic) associated with programs using file I/O.  That is to say,
there is a realm for which the idea of Andrew Holway is appropriate.

With regard to the comment by Jim Lux, "Once you've got a generalized
fast approach using message passing, it's very scalable."  In some
cases, asynchronous messages and/or one-sided communication allows
processes to be avoid forced synchronization when not required.  Also,
in some cases PGAS languages have proven to be more scalable than MPI.
It is an interesting area of practical investigation.

Regards,
Alan Scheinine

Lux, Jim (337C) wrote:
> 
> On 9/22/12 12:47 PM, "Alan Louis Scheinine" <alscheinine at tuffmail.us>
> wrote:
> 
>> Andrew Holway wrote:
>>  > I've been playing around with GFS and Gluster a bit recently and this
>>  > has got me thinking... Given a fast enough, low enough latency network
>>  > might it but possible to have a Gluster like or GFS like memory space?
>>
>> For random access, hard disk access times are milliseconds when the r/w
>> head needs to move whereas Infiniband switch latency is less then ten
>> microseconds.  So if an algorithm needs highly random access over more
>> memory than a single node, combining memory of a cluster might be the
>> best solution for certain problems.  I do not know of any parallel
>> filesystem that uses memory mapping rather than parallel disks, but
>> it seems like a useful utility.  While it is possible to put a huge
>> amount of memory into a single node, that node would be specialized,
>> whereas using the memory of a cluster means that the same memory serves
>> a general-purpose cluster when not being used for the specialized
>> memory-based
>> parallel file system.
> 
> But isn't that basically the old multiport memory or crossbar switch kind
> of thing? (Giant memory shared by multiple processors).
> 
> Aside from things like cache coherency, it has scalability problems (from
> physical distance reasons: propagation time, if nothing else)
> 
> Philosophically, giant virtual memory schemes seem to try to make life
> easier by trying to shoehorn a "single thread/processor" model.
> 
> Shared memory data passing is a bit better, with the problems of
> synchronization and semaphores.
> 
> I think the future is in explicitly recognizing that you have to pass
> messages serially and designing algorithms that are tolerant of things
> like missing messages, variable (but bounded) latency (or heck, latency at
> all).
> 
> Once you've got a generalized fast approach using message passing, it's
> very scalable.

-- 

  Alan Scheinine
  200 Georgann Dr., Apt. E6
  Vicksburg, MS  39180

  Email: alscheinine at tuffmail.us
  Mobile phone: 225 288 4176

  http://www.flickr.com/photos/ascheinine