[Beowulf] SSDs for HPC?

Tue Apr 8 07:25:30 PDT 2014

On 04/08/2014 10:12 AM, Lux, Jim (337C) wrote:
> On 4/7/14 6:48 PM, "Ellis H. Wilson III" <ellis at cse.psu.edu> wrote:
>
>> On 04/07/2014 09:34 PM, Prentice Bisbal wrote:
>>>> Was it wear out, or some other failure mode?
>>>>
>>>> And if wear out, was it because consumer SSDs have lame leveling or
>>>> something like that?
>>>>
>>> Here's how I remember it. You took the capacity of the disk, figured out
>>> how much data would have to be written to it wear it out, and then
>>> divided that by the bandwidth of the drive to figure out how long it
>>> would take to write that much data to the disk if data was constantly
>>> being written to it. I think the answer was on the order of 5-10 years,
>>> which is a bit more than the expected lifespan of a cluster, making it a
>>> non-issue.
>>
>> This would be the ideal case, but requires perfect wear-leveling and
>> write amplification factor of 1.  Unfortunately, those properties rarely
>> hold.
>>
>> However, again, in the case of using it as a Hadoop intermediate disk,
>> write amp would be a non-issue because you'd be blowing away data after
>> runs (make sure to use a scripted trim or something, unless the FS
>> auto-trims, which you may not want), and wear-leveling would be less
>> important because the data written/read would be large highly
>> sequential.  Wear-leveling would be trivial under those conditions.
>>
>
> Wear leveling would be trivial, if one were designing the wear leveling
> algorithms.

Or if you were using a workload that would operate well under any given 
wear-leveling algorithm, as the example I gave.  Hence, "under those 
conditions."

> I could easily see a consumer device having a different algorithm from an
> enterprise device, either because they just spend more time and money
> getting a good algorithm, or because of different underlying assumptions
> about write/read patterns.

This is my understanding.  Further, sometimes different algorithms are 
in use due to acquisitions -- the old algorithm just gets used in the 
commodity drives, and the new one is just used in enterprise.  Sometimes 
there are resource reasons for this (the enterprise one is more 
CPU-intensive or DRAM-requiring within the SSD).

> Even in an enterprise environment, there's some very different write
> patterns possible.  A "scratch" device might get written randomly, while a
> "logging" device will tend to be written sequentially.  Consider something
> like a credit card processing system.  This is going to have a lot of "add
> at the end" transaction data.  As opposed to, say, a library catalog where
> books are checked out essentially at random, and you update the "check
> out/check in" status, and writes are sprinkled randomly through out the
> data.

I agree, which is what makes wear-leveling such an interesting (and 
well-researched) area in the SSD field.  However, my suggestion for 
Prentice on how to use it in his system (keeping the discussion on 
point) avoided dealing with the wide variety of issues SSD manufacturers 
have to cope with.

> Sadly, much of this will not be particularly well documented, if at all.

Supposedly more APIs are being exposed to control wear-leveling, when GC 
kicks, in, etc (I believe Samsung is on the forefront here).  But this 
is just what I have heard.  I don't have examples to share just yet. 
Very little has been said in this space in the past because these were 
the most highly guarded of the proprietary algorithms in the SSD arena. 
  As more and more algorithms gets researched and are made effectively 
open-source (i.e., yet another sad case of computer science catching up 
with industry) pressure is off to protect these so much, and on to give 
the reigns to the user.

Best,

ellis

-- 
Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University
www.ellisv3.com