[Beowulf] Odd Infiniband scaling behaviour

Tom Elken tom.elken at qlogic.com
Mon Oct 8 09:32:55 PDT 2007


> -----Original Message-----
> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Chris Samuel
> Sent: Sunday, October 07, 2007 10:25 PM
> To: beowulf at beowulf.org
> Subject: [Beowulf] Odd Infiniband scaling behaviour
> 
> Hi fellow Beowulfers..
> 
> We're currently building an Opteron based IB cluster, and are 
> seeing some rather peculiar behaviour that has had us puzzled 
> for a while.

To give us more info about your "scaling" problem, can you tell us 

1) the elapsed run-time of the four scenarios you mention (or relative
run-times)?

2) how you measured the CPU usage? 

Thanks,
Tom

> 
> If I take a CPU bound application, like NAMD, I can run an 8 CPU job
> on a single node and it pegs the CPUs at 100%   (this is built using
> Charm++ configured as an MPI system and using MVAPICH 0.9.8p3
> with the Portland Group Compilers).
> 
> If I then run 2 x 4 CPU jobs of the *same* problem, they all 
> run at 50% CPU.
> 
> If I run 4 x 2 CPU jobs, again the same problem, they run at 25%..
> 
> ..and yes, if I run 8 x 1 CPU jobs they run at around 12-13% CPU!
> 
> I then replicated the same problem with the example MPI cpi.c 
> program, to rule out some odd behaviour in NAMD.
> 
> What really surprised me was when testing CPI built using 
> OpenMPI (which doesn't use IB on our system) the problem 
> vanished and I could run 8 x 1 CPU jobs, each using 100%!
> 
> So (at the moment) it looks like we're seeing some form of 
> contention on the Infiniband adapter..
> 
> 07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost 
> III Lx HCA] (rev a0)
>         Subsystem: Mellanox Technologies MT25204 [InfiniHost 
> III Lx HCA]
>         Flags: fast devsel, IRQ 19
>         Memory at feb00000 (64-bit, non-prefetchable) [size=1M]
>         Memory at fd800000 (64-bit, prefetchable) [size=8M]
>         Capabilities: [40] Power Management version 2
>         Capabilities: [48] Vital Product Data
>         Capabilities: [90] Message Signalled Interrupts: 
> 64bit+ Queue=0/5 Enable-
>         Capabilities: [84] MSI-X: Enable- Mask- TabSize=32
>         Capabilities: [60] Express Endpoint IRQ 0
> 
> We see this problem with the standard CentOS kernel, with the 
> latest stable kernel (2.6.22.9) and with 2.6.23-rc9-git5 
> (which completely rips out and replaced the CPU scheduler 
> with Ingo Molnar's CFS).
> 
> This is on a SuperMicro based system with AMD's Barcelona 
> quad core CPU (1.9GHz), but I see the same behaviour (scaled 
> down) on dual core Opterons too.
> 
> I've looked at what "modinfo ib_mthca" says are the tuneable 
> options, but the few I've played with ("msi_x" and 
> "tune_pci") haven't made any noticeable difference, sadly..
> 
> Has anyone else run into this or got any clues they could 
> pass on please ?
> 
> cheers,
> Chris
> --
> Christopher Samuel - (03) 9925 4751 - Systems Manager  The 
> Victorian Partnership for Advanced Computing  P.O. Box 201, 
> Carlton South, VIC 3053, Australia VPAC is a not-for-profit 
> Registered Research Agency
> 




More information about the Beowulf mailing list