[Beowulf] SSD caching for parallel filesystems
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Mon Feb 11 14:44:57 PST 2013
>> In any event, your original statement used to be wholly correct.
>> It has
>> changed to a certain degree to "SSDs are about IOPs," which isn't
>> quite the same thing. However, more pointedly, with modern HDDs
>> barely approaching 200MB/s and SSD solutions approaching 2-4GB/s,
>> this is an increasingly limited viewpoint. We have to start
>> considering their use for bandwidth.
>
> Find me an application that needs big bandwidth and doesn't need
> massive storage.
>
>
>
> Digital waveform recording and playback.. e.g. in radar simulators.
> You need very wide bandwidth, but not a huge amount of storage (e.g.
> If I'm playing back a synthetic response to a 1 millisecond pulse with
> 2 GHz BW, I only need 10s of Megasamples at most, but you need 10
> Gsample/second sorts of bandwidth)
>
> One might thing, heck, just slap a few GByte of RAM in there and be
> done with it, but if you're simulating a radar with 10 different pulse
> types, and you have 10-20 simulated targets each with several
> different viewing aspects, you pretty quickly need a "library" of
> several thousand pulses/returns to choose from.
Yeah well i remember negotiating about writing CUDA code for simulation software of something similar.
Don't think that this example applies. You want it in RAM for a proper simulation :)
---
Nope... you want to store it in disk..
a) 4 bytes/sample @ 20 Megasamples/pulse is 80 Mbyte/pulse
b) * 1000 pulses is 80 GB.
That's a lot of RAM (and a lot of power, if you DID buy that much ram).
A few Gbyte/second coming out of a SSD makes it actually feasible to "stream from disk array" and keep that 1-2 GSample/Second pipeline full.
And on the receive side, where you want to capture the transmitted pulses (or returns), a similar sort of thing applies, although SSDs aren't a ball o'fire for write speed, they ARE faster than spinning magnetic media, so to get a given throughput, it takes fewer drives.
Sometimes, it's the "number of drives" that is the cost determining aspect. You don't need a lot of space, but you do need a very fast transfer rate, and ganging up drives in parallel is how it's done. The instantaneous seek aspect of a SSD is also nice, because you don't have to worry about rotational latency in this kind of application.
More information about the Beowulf
mailing list