[Beowulf] Cluster consistency checks

Kenneth Hoste kenneth.hoste at ugent.be
Tue Mar 29 06:16:48 PDT 2016



On 29/03/16 15:10, Michael Di Domenico wrote:
> On Tue, Mar 29, 2016 at 9:01 AM, Olli-Pekka Lehto
> <olli-pekka.lehto at csc.fi> wrote:
>>>>>> - Simple MPI latency / bandwidth test called mpisweep that tests every
>>>>>> link (I'll put this up on github later as well)
>>>> Any reference to mpisweep yet?
>>>>
>>>> Google didn't give me much...
>>>>
>>> That's an internal code I whipped up at some point. Pretty much the minimum
>>> viable program to do a sweep of all the connections. I'll try to clean it up a
>>> bit and put it up in the next few days.
>> I put it now up on github. Very simple and short :)
>>
>> https://github.com/CSC-IT-Center-for-Science/mpisweep
> as a programming exercise it would be handy to extend this to doing
> around the world ping tests.  where by i mean 1 xfers to all, 2 xfers
> to all, 3 xfers to all
>
> i recently found a misbehaving IB card inside a multi-card switch
> using this approach.   it was a very odd result where by only certain
> pairings of hosts in certain directions were failing with slow latency
> and bandwidth

Maybe take a look at https://github.com/hpcugent/mympingpong (it has 
pretty pictures in the README).


K.


More information about the Beowulf mailing list