[Beowulf] Checkpointing using flash

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Sat Sep 22 14:29:25 PDT 2012

On 9/22/12 12:47 PM, "Alan Louis Scheinine" <alscheinine at tuffmail.us>

>Andrew Holway wrote:
>  > I've been playing around with GFS and Gluster a bit recently and this
>  > has got me thinking... Given a fast enough, low enough latency network
>  > might it but possible to have a Gluster like or GFS like memory space?
>For random access, hard disk access times are milliseconds when the r/w
>head needs to move whereas Infiniband switch latency is less then ten
>microseconds.  So if an algorithm needs highly random access over more
>memory than a single node, combining memory of a cluster might be the
>best solution for certain problems.  I do not know of any parallel
>filesystem that uses memory mapping rather than parallel disks, but
>it seems like a useful utility.  While it is possible to put a huge
>amount of memory into a single node, that node would be specialized,
>whereas using the memory of a cluster means that the same memory serves
>a general-purpose cluster when not being used for the specialized
>parallel file system.

But isn't that basically the old multiport memory or crossbar switch kind
of thing? (Giant memory shared by multiple processors).

Aside from things like cache coherency, it has scalability problems (from
physical distance reasons: propagation time, if nothing else)

Philosophically, giant virtual memory schemes seem to try to make life
easier by trying to shoehorn a "single thread/processor" model.

Shared memory data passing is a bit better, with the problems of
synchronization and semaphores.

I think the future is in explicitly recognizing that you have to pass
messages serially and designing algorithms that are tolerant of things
like missing messages, variable (but bounded) latency (or heck, latency at

Once you've got a generalized fast approach using message passing, it's
very scalable.


More information about the Beowulf mailing list