[Beowulf] Exascale by the end of the year?

Ellis H. Wilson III ellis at cse.psu.edu
Wed Mar 5 08:06:16 PST 2014

On 03/05/2014 10:55 AM, Douglas Eadline wrote:
>> Hash: SHA1
>> On 05/03/14 13:52, Joe Landman wrote:
>> The one everyone thinks about first is power, but the other one she
>> touched on was reliability and uptime.
> Indeed, the fact that these issues were not even mentioned
> means to me the project is not very well thought out.
> At exascale (using current tech) failure recovery must be built
> into any design, either software and/or hardware.

Although, as stated, I think this guy's full of it, he did touch on both 
of them in the call.  Very high level (fuel cells for power, modular 
enclosures for failure tolerance, etc), but he they were "mentioned." 
Some supposed NDA with Intel prevented him from talking about a chunk of it.

Also, failure may or may not be a huge issue for the application 
described (genetic algoritm-based HFT applications).  Not tightly 
coupled.  Lose a node (or a rack) and you just lose some fraction of 
your population.  GA's (like real biology) are by design intended to 
deal with losses like that.

Making HPL run through successfully is a whole 'nother beast.



Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University

More information about the Beowulf mailing list