[Beowulf] NAMD/CUDA scaling: QDR Infiniband sufficient?

Mon Feb 9 17:13:28 PST 2009

On Mon, Feb 09, 2009 at 03:37:06PM -0500, Dow Hurst DPHURST wrote:
> Subject: [Beowulf] NAMD/CUDA scaling: QDR Infiniband sufficient?
> 
>    Has anyone tested scaling of NAMD/CUDA over QLogic or ConnectX QDR
>    interconnects for a large number of IB cards and GPUs?

We haven't.  Don't have a CUDA/GPU-equipped cluster on which to test.

>   I've listened
>    to John Stone's presentation on VMD and NAMD CUDA acceleration.  The
>    consensus I brought away from the presentation was that one QDR per GPU
>    would probably be necessary to scale efficiently.  The 60 node, 60 GPU,
>    DDR IB enabled cluster that was used for initial testing was saturating
>    the interconnect.  

It would be interesting to do some MPI profiling on the NAMD/CUDA cluster to find out whether message rate or bandwidth was saturating the interconnect.  Molecular Dynamics applications often generate high message rates.

With traditional NAMD, the highest results on the NAMD Benchmark Page
http://www.ks.uiuc.edu/Research/namd/performance.html 
were achieved with QLogic InfiniPath InfiniBand adapters (which are known for supporting a very high MPI message rate) and SilverStorm IB switch.  The demand for high MPI message rate goes up as you scale up, and that is probably the reason that the InfiniPath advantage grows with increasing core count (as shown in the above web page).  The top two results on this (slightly dated) benchmark page are both with SDR InfiniBand adapters.  We have not run NAMD on a large DDR or QDR equipped cluster yet.

NAMD/CUDA code/algorithms have probably changed significantly, so profiling would be necessary to see what causes the interconnect saturation.  It might be bandwidth, message rate, or a combination of the two.

-Tom Elken
Mgr., Performance Engineering
QLogic Corp.

> Later tests on the new GT200 based cards show even
>    more performance gains for the GPUs.  1 GPU performing the work of 12
>    CPUs or 8 CPUs equaling 96 cores were the numbers I saw.  So with a
>    ratio of 1gpu/12cores, interconnect performance will be very important.
>    Thanks,
>    Dow