[Beowulf] SSD caching for parallel filesystems
Ellis H. Wilson III
ellis at cse.psu.edu
Fri Feb 8 08:12:35 PST 2013
On 02/06/2013 04:36 PM, Prentice Bisbal wrote:
> Beowulfers,
>
> I've been reading a lot about using SSD devices to act as caches for
> traditional spinning disks or filesystems over a network (SAN, iSCSI,
> SAS, etc.). For example, Fusion-io had directCache, which works with
> any block-based storage device (local or remote) and Dell is selling
> LSI's CacheCade, which will act as a cache for local disks.
>
> http://www.fusionio.com/data-sheets/directcache/
> httP//www.dell.com/downloads/global/products/pedge/en/perc-h700-cachecade.pdf
>
> Are there any products like this that would work with parallel
> filesystems, like Lustre or GPFS? Has anyone done any research in this
> area? Would this even be worthwhile?
Coming late to this discussion, but I'm currently doing research in this
area and have a publication in submission about it. What are you trying
to do specifically with it? NAND flash, with it's particularly nuanced
performance behavior, is not right for all applications, but can help if
you think through most of your workloads and cleverly architect your
system based off of that.
For instance, there has been some discussion about PCIe vs SATA -- this
is a good conversation, but what's left out is that many manufacturers
do not actually use native PCIe "inside" the SSD. It is piped out from
the individual nand packages in something like a SATA format, and then
transcoded to PCIe before going out of the device. This results in
latency and bandwidth degredation, and although a bunch of those devices
on Newegg and elsewhere under the PCIe category report 2 or 3 or 4 GB/s,
it's closer to just 1 or under. Latency is still better on these than
SATA-based ones, but if I just wanted bandwidth I'd buy a few, cheaper,
SATA-based ones and strap them together with RAID.
On a similar note (discussing the nuances of the setup and the
components), if your applications are embarrassingly parallel or your
network is really slow (1Gb Ether) they client-side caching is
definitely the way to go. But, if there are ever sync points in the
applications or you have a higher throughput, lower latency network
available to you, going for a storage-side cache will allow for
write-back capabilities as well as higher throughput and lower latency
promises than a single client side cache could provide. Basically, this
boils down to do you want M cheaper devices in each node that don't have
to go over the network but that are higher latency and lower bandwidth,
or do you want an aggregation of N more expensive devices that can give
lower latency and much higher bandwidth, but over the network.
For an example on how some folks did it with storage-local caches up at
LBNL see the following paper:
Zheng Zhou et al. An Out-of-core Eigensolver on SSD-equipped Clusters,
in Proc. of Cluster’12
If anybody has any other papers that look at this of worth, or other
projects that look particularly at client-side SSD caching, I'd love to
hear about them. I think in about five years all new compute-nodes will
be built with some kind of non-volatile cache or storage -- it's just
too good of a solution, particularly with network bandwidth and latency
not scaling as fast as NVM properties.
Best,
ellis
More information about the Beowulf
mailing list