[Beowulf] SSD caching for parallel filesystems

Fri Feb 8 08:12:35 PST 2013

On 02/06/2013 04:36 PM, Prentice Bisbal wrote:
> Beowulfers,
>
> I've been reading a lot about using SSD devices to act as caches for
> traditional spinning disks or filesystems over a network (SAN, iSCSI,
> SAS, etc.).  For example, Fusion-io had directCache, which works with
> any block-based storage device (local or remote) and Dell is selling
> LSI's CacheCade, which will act as a cache for local disks.
>
> http://www.fusionio.com/data-sheets/directcache/
> httP//www.dell.com/downloads/global/products/pedge/en/perc-h700-cachecade.pdf
>
> Are there any products like this that would work with parallel
> filesystems, like Lustre or GPFS? Has anyone done any research in this
> area? Would this even be worthwhile?

Coming late to this discussion, but I'm currently doing research in this 
area and have a publication in submission about it.  What are you trying 
to do specifically with it?  NAND flash, with it's particularly nuanced 
performance behavior, is not right for all applications, but can help if 
you think through most of your workloads and cleverly architect your 
system based off of that.

For instance, there has been some discussion about PCIe vs SATA -- this 
is a good conversation, but what's left out is that many manufacturers 
do not actually use native PCIe "inside" the SSD.  It is piped out from 
the individual nand packages in something like a SATA format, and then 
transcoded to PCIe before going out of the device.  This results in 
latency and bandwidth degredation, and although a bunch of those devices 
on Newegg and elsewhere under the PCIe category report 2 or 3 or 4 GB/s, 
it's closer to just 1 or under.  Latency is still better on these than 
SATA-based ones, but if I just wanted bandwidth I'd buy a few, cheaper, 
SATA-based ones and strap them together with RAID.

On a similar note (discussing the nuances of the setup and the 
components), if your applications are embarrassingly parallel or your 
network is really slow (1Gb Ether) they client-side caching is 
definitely the way to go.  But, if there are ever sync points in the 
applications or you have a higher throughput, lower latency network 
available to you, going for a storage-side cache will allow for 
write-back capabilities as well as higher throughput and lower latency 
promises than a single client side cache could provide.  Basically, this 
boils down to do you want M cheaper devices in each node that don't have 
to go over the network but that are higher latency and lower bandwidth, 
or do you want an aggregation of N more expensive devices that can give 
lower latency and much higher bandwidth, but over the network.

For an example on how some folks did it with storage-local caches up at 
LBNL see the following paper:

Zheng Zhou et al. An Out-of-core Eigensolver on SSD-equipped Clusters,
in Proc. of Cluster’12

If anybody has any other papers that look at this of worth, or other 
projects that look particularly at client-side SSD caching, I'd love to 
hear about them.  I think in about five years all new compute-nodes will 
be built with some kind of non-volatile cache or storage -- it's just 
too good of a solution, particularly with network bandwidth and latency 
not scaling as fast as NVM properties.

Best,

ellis