[Beowulf] SSD caching for parallel filesystems

Fri Feb 8 08:28:00 PST 2013

On 02/08/2013 11:20 AM, Brock Palen wrote:
> To add another side note, in the process of interviewing the Gluster team for my podcast (www.rce-cast.com) he mentioned writing a plugin, that would first write data local to the host, and gluster would then take it to the real disk in the background.  There was constraints to doing this.  I assume because there was no locking to promise consistency, but for some workloads this might be useful, and combine it with local Flash.

Yea, with specialized semantics (thinking write-once here) this should 
work fine.  There is a good reason why client-side write caches aren't 
really used with SSDs yet -- they are absolutely massive relative to 
write-back caches in DRAM and this makes it hard to come up with truly 
"generally usable" semantics for typical applications to make use of this.

This gluster addition would make it a good candidate for "checkpoint 
trickling."  Its much easier on the network and faster for the 
computation to just be able to checkpoint to local SSD (or near-local; 
think I/O node in Blue gene arch) and let that "trickle" the checkpoint 
down to "safe" disk over the next computation cycle.  This way, you 
aren't paying some huge wait for the checkpoint to complete to "safe" 
disk, but you also aren't relying solely on the local SSDs not going 
belly up -- if one does you just have to roll back two iterations, 
rather than just one.

I think there is a lot of examination of this kind of an approach in the 
super's space, as checkpoints really, really hurt there.

Best,

ellis