[Beowulf] SSDs for HPC?

Mon Apr 7 18:34:19 PDT 2014

On 04/07/2014 03:38 PM, Lux, Jim (337C) wrote:
>   
>
>
>
>
>
> On 4/7/14 12:11 PM, "Lockwood, Glenn" <glock at sdsc.edu> wrote:
>
>> On Apr 7, 2014, at 11:53 AM, Prentice Bisbal
>> <prentice.bisbal at rutgers.edu> wrote:
>>
>>> 4. SSDs wearing out. Is that still a concern, or are lifespans getting
>>> better?I think Jim Lux once did calculations on that list to show that
>>> with wear-leveling and everything else, even if you wrote to an SSD
>>> constantly, it would still outlive the average lifespan of a cluster.
>> As long as you use enterprise-grade SSDs (e.g., Intel's stuff) with
>> overprovisioning, the nand endurance shouldn't be an issue over the
>> lifetime of a cluster.  We've used SSDs as our nodes' system disks for a
>> few years now (going on four with our oldest, 324-node production
>> system), and there have been no major problems.  The major problems
>> happened when we were using the cheaper commodity SSDs.  Don't give in to
>> the temptation to save a few pennies there.
>>
> Was it wear out, or some other failure mode?
>
> And if wear out, was it because consumer SSDs have lame leveling or
> something like that?
>
Here's how I remember it. You took the capacity of the disk, figured out 
how much data would have to be written to it wear it out, and then 
divided that by the bandwidth of the drive to figure out how long it 
would take to write that much data to the disk if data was constantly 
being written to it. I think the answer was on the order of 5-10 years, 
which is a bit more than the expected lifespan of a cluster, making it a 
non-issue.

That's how I remember it, how it actually happened could be different. I 
could try search the archives,  but if there's one thing I've learned 
from CNN, it's that speculation is better than actual facts.

-- 
Prentice Bisbal
Manager of Information Technology
Rutgers Discovery Informatics Institute (RDI2)
Rutgers University
http://rdi2.rutgers.edu