[Beowulf] Exascale by the end of the year?
Ellis H. Wilson III
ellis at cse.psu.edu
Wed Mar 5 08:06:16 PST 2014
On 03/05/2014 10:55 AM, Douglas Eadline wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> On 05/03/14 13:52, Joe Landman wrote:
>> The one everyone thinks about first is power, but the other one she
>> touched on was reliability and uptime.
> Indeed, the fact that these issues were not even mentioned
> means to me the project is not very well thought out.
> At exascale (using current tech) failure recovery must be built
> into any design, either software and/or hardware.
Although, as stated, I think this guy's full of it, he did touch on both
of them in the call. Very high level (fuel cells for power, modular
enclosures for failure tolerance, etc), but he they were "mentioned."
Some supposed NDA with Intel prevented him from talking about a chunk of it.
Also, failure may or may not be a huge issue for the application
described (genetic algoritm-based HFT applications). Not tightly
coupled. Lose a node (or a rack) and you just lose some fraction of
your population. GA's (like real biology) are by design intended to
deal with losses like that.
Making HPL run through successfully is a whole 'nother beast.
Department of Computer Science and Engineering
The Pennsylvania State University
More information about the Beowulf