[Beowulf] SSD caching for parallel filesystems
Mark Hahn
hahn at mcmaster.ca
Mon Feb 11 21:04:07 PST 2013
this is getting absurd. I think we all know the relative prices
and performances of off-the-shelf disks/ssd/ram. each have peculiarities
that make their use somewhat complex.
- with disks, you have to think about seek time, since it can range from
zero to ~15ms. for some workloads, a saving grace is that with request
sorting (helped also by disk-level queue reordering), you can perform
several transactions along the way. disks have historically followed
a pretty steep curve of improved density, somewhat akin to Moore's law,
which has delivered ever-higher density and attendant bandwidth.
- with ram, you get the proverbial random access. like most proverbs,
that's only a little true: ram has banking and paging effects, and
emphatically rewards sequential access. it also suffers from a very
stiff industry that hasn't managed to adopt a transactional interface,
even though cpus have evermore internal concurrency. (caches have let
dram designers stay very lazy...)
- with flash, you'll probably never have a random-access interface -
it'll always be a disk-like block-transfer thing. why? because flash
has to be remapped to be useful, and that remapping has to change
during use (indeed, *because*of* use).
the discussion of PCIe and NVMe were pretty much a diversion, since
none of them are substantially altering the block-transfer nature of flash.
yes, something like NVMe does simplify the protocol being employed, but
it's still a mechanism for queueing block-transfer requests, like any
IO device (SATA, SCSI, RAID, even eth/IB networks.)
it would be amusing to see a flash vendor take a page from networks,
and offer "flash rdma". but frankly, I'm not sure there's enough niche
for high-end flash at all. high-volume devices will always just follow
the capacity-performance bounds of current flash fabs. an awfully big
chunk of the IT industry wants *distributed* performance (the Googles
of the world) and won't normally want to pay the premium of a PCIe flash
device, since they can get arbitrary aggregate performance with big clusters,
and need big clusters anyway for other reasons.
More information about the Beowulf
mailing list