[Beowulf] MPI, fault handling, etc.

Charlie Peck charliep at cs.earlham.edu
Mon Mar 14 17:35:43 PDT 2016


> On Mar 14, 2016, at 13:55, Lux, Jim (337C) <james.p.lux at jpl.nasa.gov> wrote:
> 
> … And communication, even between nodes of a cluster, isn’t free, nor infinitely scalable. I think that with a lot of problems, it’s the communication bottleneck that is the “rate limiting” step, whether it’s CPU:cache; CPU:RAM; or node:node communications.  ...

Henry Neeman calls this the tyranny of the memory hierarchy and focuses on it quite heavily in his Supercomputing in Plain English series. My experiences benchmarking and tuning scientific software has borne this out many times over the years.

charlie




More information about the Beowulf mailing list