[Beowulf] SSD caching for parallel filesystems
Ellis H. Wilson III
ellis at cse.psu.edu
Fri Feb 8 08:28:00 PST 2013
On 02/08/2013 11:20 AM, Brock Palen wrote:
> To add another side note, in the process of interviewing the Gluster team for my podcast (www.rce-cast.com) he mentioned writing a plugin, that would first write data local to the host, and gluster would then take it to the real disk in the background. There was constraints to doing this. I assume because there was no locking to promise consistency, but for some workloads this might be useful, and combine it with local Flash.
Yea, with specialized semantics (thinking write-once here) this should
work fine. There is a good reason why client-side write caches aren't
really used with SSDs yet -- they are absolutely massive relative to
write-back caches in DRAM and this makes it hard to come up with truly
"generally usable" semantics for typical applications to make use of this.
This gluster addition would make it a good candidate for "checkpoint
trickling." Its much easier on the network and faster for the
computation to just be able to checkpoint to local SSD (or near-local;
think I/O node in Blue gene arch) and let that "trickle" the checkpoint
down to "safe" disk over the next computation cycle. This way, you
aren't paying some huge wait for the checkpoint to complete to "safe"
disk, but you also aren't relying solely on the local SSDs not going
belly up -- if one does you just have to roll back two iterations,
rather than just one.
I think there is a lot of examination of this kind of an approach in the
super's space, as checkpoints really, really hurt there.
More information about the Beowulf