[Beowulf] latency vs bandwidth for NAMD

Wed Aug 22 10:52:59 PDT 2007

I ran the TACC benchmarks myself, and the config files are posted at 
http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdAtTexas

Part of the performance difference may be due to differences between the 
2.6 released and newer cvs versions of NAMD, but even the released amd64 
binary ran really fast at TACC.  There may be some special sauce in the 
servers or OS that I'm not aware of.

-Jim

On Wed, 22 Aug 2007, Kevin Ball wrote:

> Hi Dow,
>
> On Wed, 2007-08-22 at 09:52, Dow Hurst DPHURST wrote:
>> Jim and Kevin,
>> Why would the 4 core point on the performance benchmark be reversed between
>> the 2.66GHz and 3.0GHz?  I'm pretty sure that the Lonestar NAMD was
>> compiled with the Intel compilers.  I don't know what was used on the
>> Cambridge Darwin cluster.  Both machines are Intel Woodcrest dual cores and
>> dual physical CPUs per node.
>
> I believe this is likely due to my lack of knowledge in regards to
> tuning the Intel compilers.  If whoever submitted the TACC results would
> be willing to send out their configuration files that would help resolve
> this question.  Thanks!
>
> -Kevin
>
>>
>> Both Infinipath clusters listed on the performance benchmark have the best
>> scaling for the apoa1 benchmark between 128 to 512 cores.
>>
>> Sure seems if SDR is good enough for an Intel Clovertown based cluster that
>> that would be more cost effective.  The Woodcrest and Clovertown are priced
>> about the same.
>>
>> Thanks for your comments!
>> Dow
>>
>> __________________________________
>> Dow P. Hurst, Research Scientist
>> Department of Chemistry and Biochemistry
>> University of North Carolina at Greensboro
>> 435 New Science Bldg.
>> Greensboro, NC 27402-6170
>> dphurst at uncg.edu
>> Dow.Hurst at mindspring.com
>> 336-334-4766 lab
>> 336-334-5122 office
>> 336-334-5402 fax
>>
>> -----Jim Phillips <jim at ks.uiuc.edu> wrote: -----
>>
>> To: Kevin Ball <kevin.ball at qlogic.com>
>> From: Jim Phillips <jim at ks.uiuc.edu>
>> Date: 08/22/2007 12:25PM
>> cc: Dow Hurst DPHURST <DPHURST at uncg.edu>, beowulf at beowulf.org
>> Subject: Re: [Beowulf] latency vs bandwidth for NAMD
>>
>>
>> Those NAMD results are up now ("Cambridge Xeon/3.0 InfiniPath" at
>> http://www.ks.uiuc.edu/Research/namd/performance.html).  My opinion is
>> that SDR is sufficient for NAMD, but I haven't had a chance to see if
>> there is any benefit to DDR.  I did hear that the new TACC Ranger cluster
>> with 16 cores per node will use SDR.  I assume that on larger clusters the
>> switch is more likely to be the limiting factor than the card (I know
>> precious little about either).
>>
>> -Jim
>>
>>
>> On Tue, 21 Aug 2007, Kevin Ball wrote:
>>
>>> Hi Dow,
>>>
>>>  The QLE7240 DDR HCA is not available yet, but we do not expect that it
>>> would have any substantial advantage on NAMD as compared to the QLE7140
>>> (SDR), because we don't believe that NAMD requires substantial pt to pt
>>> bandwidth from the interconnect.
>>>
>>>  The TACC cluster is not using QLogic InfiniBand (IB) cards, but I
>>> believe they are SDR IB cards from another vendor.
>>>
>>>  Just last week I submitted a result to the folks at UIUC with results
>>> on a similar cluster with the QLE7140.  It has not yet shown up on their
>>> results page, but in essence, the scalability is similar until around
>>> 256 cores, at which point the results diverge with the QLE7140 cluster
>>> dramatically outperforming the TACC cluster at 512 cores.
>>>
>>>  I expect the QLE7140 results will show up in the next week or so on
>>> that website, (http://www.ks.uiuc.edu/Research/namd/performance.html) so
>>> you can compare to TACC performance at that time.  On that site you can
>>> also see performance with a number of other machines, including an SGI
>>> Altix with much higher pt to pt bandwidth yet worse scaling than IB,
>>> which is part of why I don't think DDR will improve results.
>>>
>>>  If you are interested in other MD codes, we have found advantages on
>>> codes like CHARMM and GROMACS as well.  Some of thsee are detailed in a
>>> white paper on our website:
>>>
>> http://www.qlogic.com/documents/datasheets/knowledge_data/whitepapers/HSG-WP07005.pdf
>>
>>>
>>>  Fair notice:  I work for QLogic on the InfiniPath product line.  I
>>> have tried my best to make what bias I have open and clear.
>>>
>>> -Kevin
>>>
>>>
>>> On Fri, 2007-08-17 at 14:03, Dow Hurst DPHURST wrote:
>>>> I'd like to get advice on how latency affects scaling of molecular
>> dynamics
>>>> codes versus total bandwidth of the interconnect card.  We use NAMD as
>> the
>>>> molecular dynamics code and have had Ammasso RDMA interconnects.  Right
>>>> now, we have a chance to upgrade and add nodes to our cluster using
>>>> Infiniband.  I've found that NAMD was coded to be latency tolerant,
>>>> however, I'd like to scale up to 64 cores and beyond.  I'm going blind
>>>> reading IB card specs, performance benchmarks, and searching Google.
>> I'd
>>>> love some advice from someone who knows whether a consistent very low
>>>> latency IB card, such as the Infinipath QLE7140, is better/worse for
>> NAMD
>>>> than a higher latency but higher bandwidth card such as the QLE7240?  I
>> can
>>>> tell that Lonestar at TACC has great NAMD performance but I can't tell
>> what
>>>> IB card is used.  I imagine that switch performance plays a large role
>> too.
>>>> Thanks for your time,
>>>> Dow
>>>>
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org
>>>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>>
>