[Beowulf] SSD caching for parallel filesystems

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Mon Feb 11 14:44:57 PST 2013

>> In any event, your original statement used to be wholly correct.
>> It has
>> changed to a certain degree to "SSDs are about IOPs," which isn't 
>> quite the same thing.  However, more pointedly, with modern HDDs 
>> barely approaching 200MB/s and SSD solutions approaching 2-4GB/s, 
>> this is an increasingly limited viewpoint.  We have to start 
>> considering their use for bandwidth.
> Find me an application that needs big bandwidth and doesn't need 
> massive storage.
> Digital waveform recording and playback.. e.g. in radar simulators.  
> You need very wide bandwidth, but not a huge amount of storage (e.g. 
> If I'm playing back a synthetic response to a 1 millisecond pulse with 
> 2 GHz BW, I only need 10s of Megasamples at most, but you need 10 
> Gsample/second sorts of bandwidth)
> One might thing, heck, just slap a few GByte of RAM in there and be 
> done with it, but if you're simulating a radar with 10 different pulse 
> types, and you have 10-20 simulated targets each with several 
> different viewing aspects, you pretty quickly need a "library" of 
> several thousand pulses/returns to choose from.

Yeah well i remember negotiating about writing CUDA code for simulation software of something similar.

Don't think that this example applies. You want it in RAM for a proper simulation :)

Nope... you want to store it in disk.. 
a) 4 bytes/sample @ 20 Megasamples/pulse is 80 Mbyte/pulse
b) * 1000 pulses is 80 GB.  

That's a lot of RAM (and a lot of power, if you DID buy that much ram).

A few Gbyte/second coming out of a SSD makes it actually feasible to "stream from disk array" and keep that 1-2 GSample/Second pipeline full.

And on the receive side, where you want to capture the transmitted pulses (or returns), a similar sort of thing applies, although SSDs aren't a ball o'fire for write speed, they ARE faster than spinning magnetic media, so to get a given throughput, it takes fewer drives.

Sometimes, it's the "number of drives" that is the cost determining aspect.  You don't need a lot of space, but you do need a very fast transfer rate, and ganging up drives in parallel is how it's done.   The instantaneous seek aspect of a SSD is also nice, because you don't have to worry about rotational latency in this kind of application.

More information about the Beowulf mailing list