[Beowulf] [External] Power Cycling Question
Lux, Jim (US 7140)
james.p.lux at jpl.nasa.gov
Mon Jul 19 18:08:32 UTC 2021
On 7/19/21, 9:12 AM, "Beowulf on behalf of Prentice Bisbal via Beowulf" <beowulf-bounces at beowulf.org on behalf of beowulf at beowulf.org> wrote:
Doug,
<snip>
I know they there is a direct relationship between system failure and
operating temperature, but I don't know if that applies to all
components, or just those with moving parts. Someone somewhere must
have done research on this. I know Google did research on hard drive
failure that was pretty popular. I would imagine they would have
researched this, too.
In general, it follows the Arrhenius relationship with some TBD exponent. 10C rise ages twice as fast is a common rule of thumb.
There's all sorts of background physics to this - drift of metallization and doping , radiation accumulation, etc.,etc.
Cycling is a different failure mechanism, and there it's propagation of microscopic defects with each cycle, as well as the more obvious "cracks in solder/PWB trace" kind of thing. One of the big issues today is the difference in CTE between the chips (or their packages) and the PWB. Column and Grid arrays that are soldered in have an issue with the corner pins/balls/columns being stressed more than the sides, and any time you have cyclic stress, you have the prospect of work hardening and micro crack propagation. Sockets with interposers do help with this, because they allow changing misalignment without failure. OTOH, now you have a socket and interposer, which can fail.
More information about the Beowulf
mailing list