[Beowulf] BIG 'ram' using SSDs - was single machine with 500 GB of RAM

Wed Jan 9 11:02:38 PST 2013

On 01/09/2013 01:21 PM, Vincent Diepeveen wrote:
> On Jan 9, 2013, at 4:33 PM, Ellis H. Wilson III wrote:
>> On 01/09/2013 08:27 AM, Vincent Diepeveen wrote:
>>> What would be a rather interesting thought for building a single box
>>> dirt cheap with huge 'RAM'
>>> is the idea of having 1 fast RAID array of SSD's function as the
>>> 'RAM'.
>>
>> This may be a more inexpensive route, but let's all note that the raw
>> latency differences between DDR2/3 RAM and /any/ SSD is multiple
>> orders
>> of magnitude.  So for a single threaded application that has been
>> asked
>> to run on all RAM, I have a strong suspicion that RAM latencies are
>> what
>> it really does need -- not just reasonable latency and high
>> throughput.
>>    But we should await Jorg's response on the application nature to
>> better flesh that out.
>>
>
> I kind of disagree here.
>
> Latency to a 4 socket box randomly to a block of 500GB ram will be in
> the 600 ns range.
> And total calculation probably will be several microseconds
> (depending upon what you do).
>
> 5 TB of SD will be faster than 60 us. That's factor 100 slower in

Maybe if you can find really nice SLC Flash devices available right now, 
which is dubious.  Almost all (even high perf) flash devices have gone 
the MLC route at this juncture, which means your 60 us figure is a very 
best case one (i.e. NOT behind a raid controller -- direct via PCI-E) 
and only is relevant for reads.  Writes will be higher, particularly as 
the disk fills.  Flash has asymmetric R/W speeds.  Also, please don't 
reference marketeering numbers in trying to refute this statement -- 
many of those largely take advantage of the DRAM cache in the SSD.

This is all ignoring the fact that 5TB (why so much?  the poster only 
said he needs 500GB...) of good SSDs will run multiple thousands of 
dollars, or a good chunk of the quoted budget of 10 thousand.  Maybe 
that will work for him, but either way, as I shared, parallelizing 
across cores/CPUs/machines is possible with this application so this 
entire conversation is moot.

Last, you keep coming back to throughput, but the real metric of 
interest with this application is random-access latencies from 
everything I can tell (else why would the poster be interested in 
getting a 500GB RAM machine?).  Buying multiple machines with fast DDR3 
RAM will dominate a RAIDed SSD setup for this case, certainly in terms 
of cost and likely also performance.  At the very worst it will perform 
around the same.

>>> You can get a bandwidth of 2 GB/s then to the SSD's "RAM" pretty
>>> easily, yet for some calculations that bandwidth might be enough
>>> given the fact you can then parallelize a few cores.
>>
>> I am at a loss as to how you can achieve that high of bandwidth
>> "pretty
>> easily."
>
> Most modern raid controllers easily deliver against 3GB/s.
>
>> In the /absolute/ best case a single SATA SSD can serve reads
>> at close to 400-500MB/s, and software RAIDing them will definitely not
>> get you the 4x you speak of.
>
> Easily 2.7GB/s, hands down, for a $300 raid controller.
>
> Just put 16 SSD's in that array. This is not rocket science!

Yea, it's computer science, and I'd love to see you try to toss 16 
crappy SSDs in a box with a crappy RAID controller and get this easy 
2.7GB/s random accesses you are touting.  Not going to happen.

Typical case of 1) Read quoted speeds on Newegg, 2) Multiply speeds by 
number of drives, 3) Form terrible expectations, 4) Try to force those 
expectations down other people's throats by saying it's not "rocket 
science."

>> Random read maybe -- certainly not random write latency.  Again, it's
>> probably best to wait on Jorg to comment on the nature of the
>> application to decide whether we should care about read or write
>> latency
>> more.
>
> That's all going in parallel. A single SSD has 16 parallel channels
> or so.
> Just pick each time a different channel.

Fire up the roflcopter, because you're reaching absurd altitudes of 
hand-waving.  No, you cannot just rewrite the FTL on commodity drives to 
redirect data, nor is it that simple even if you did.  There are 
multiple channels, multiple packages per channel, multiple dies per 
package, and multiple planes per die.  Companies rise and fall on the 
efficacy of their FTL to manage all of this complexity.  Some of the 
enterprise SSDs allow you to do fancier stuff with where you want your 
data to go actually within the SSD, but this is absolutely, 100% not 
possible with your COTS SSD.  Nor would you want to even if it was 
unless your the most expert flash architect on the planet.

>> You could instead just mmap the flash device directly if you really
>> just
>> want to use it truly like 'RAM.'  Optimizing filesystems is entirely
>> nontrivial.
>
> Flash goes via USB - central lock here - central lock there. Not a
> good idea.
> Flash is in fact the same thing like what SSD's have, performancewise
> spoken.

?????

Flash is a device medium.  I'm not referring to pen-drives here, by any 
means.  "Flash device" allows me to speak about PCI-E type of flash 
storage that don't act anything like a typical SSD internally and your 
traditional SSD in the same sentence.  Nobody is suggesting plugging a 
bunch of USB drives into the machine -- that would of course be absurd.

> I'm on a mailing list optimizing for SSD and Flash: NILFS, check it out.

Thanks, but I get enough of you here.  No need to double my dosage.

Best,

ellis