[Beowulf] SSDs for HPC?
Ellis H. Wilson III
ellis at cse.psu.edu
Tue Apr 8 07:25:30 PDT 2014
On 04/08/2014 10:12 AM, Lux, Jim (337C) wrote:
> On 4/7/14 6:48 PM, "Ellis H. Wilson III" <ellis at cse.psu.edu> wrote:
>
>> On 04/07/2014 09:34 PM, Prentice Bisbal wrote:
>>>> Was it wear out, or some other failure mode?
>>>>
>>>> And if wear out, was it because consumer SSDs have lame leveling or
>>>> something like that?
>>>>
>>> Here's how I remember it. You took the capacity of the disk, figured out
>>> how much data would have to be written to it wear it out, and then
>>> divided that by the bandwidth of the drive to figure out how long it
>>> would take to write that much data to the disk if data was constantly
>>> being written to it. I think the answer was on the order of 5-10 years,
>>> which is a bit more than the expected lifespan of a cluster, making it a
>>> non-issue.
>>
>> This would be the ideal case, but requires perfect wear-leveling and
>> write amplification factor of 1. Unfortunately, those properties rarely
>> hold.
>>
>> However, again, in the case of using it as a Hadoop intermediate disk,
>> write amp would be a non-issue because you'd be blowing away data after
>> runs (make sure to use a scripted trim or something, unless the FS
>> auto-trims, which you may not want), and wear-leveling would be less
>> important because the data written/read would be large highly
>> sequential. Wear-leveling would be trivial under those conditions.
>>
>
> Wear leveling would be trivial, if one were designing the wear leveling
> algorithms.
Or if you were using a workload that would operate well under any given
wear-leveling algorithm, as the example I gave. Hence, "under those
conditions."
> I could easily see a consumer device having a different algorithm from an
> enterprise device, either because they just spend more time and money
> getting a good algorithm, or because of different underlying assumptions
> about write/read patterns.
This is my understanding. Further, sometimes different algorithms are
in use due to acquisitions -- the old algorithm just gets used in the
commodity drives, and the new one is just used in enterprise. Sometimes
there are resource reasons for this (the enterprise one is more
CPU-intensive or DRAM-requiring within the SSD).
> Even in an enterprise environment, there's some very different write
> patterns possible. A "scratch" device might get written randomly, while a
> "logging" device will tend to be written sequentially. Consider something
> like a credit card processing system. This is going to have a lot of "add
> at the end" transaction data. As opposed to, say, a library catalog where
> books are checked out essentially at random, and you update the "check
> out/check in" status, and writes are sprinkled randomly through out the
> data.
I agree, which is what makes wear-leveling such an interesting (and
well-researched) area in the SSD field. However, my suggestion for
Prentice on how to use it in his system (keeping the discussion on
point) avoided dealing with the wide variety of issues SSD manufacturers
have to cope with.
> Sadly, much of this will not be particularly well documented, if at all.
Supposedly more APIs are being exposed to control wear-leveling, when GC
kicks, in, etc (I believe Samsung is on the forefront here). But this
is just what I have heard. I don't have examples to share just yet.
Very little has been said in this space in the past because these were
the most highly guarded of the proprietary algorithms in the SSD arena.
As more and more algorithms gets researched and are made effectively
open-source (i.e., yet another sad case of computer science catching up
with industry) pressure is off to protect these so much, and on to give
the reigns to the user.
Best,
ellis
--
Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University
www.ellisv3.com
More information about the Beowulf
mailing list