[Beowulf] MPI2007 out - strange pop2 results?
atchley at myri.com
Sat Jul 21 03:43:52 PDT 2007
Presentation at ISC? I did not attend this year and, while I did last
year, I did not give any presentations. I simply talked to customers
in our booth and walked the floor. I even stopped by the Mellanox
booth and chatted awhile. :-)
On Jul 20, 2007, at 9:31 PM, Gilad Shainer wrote:
> Hi Scot,
> I always try to mention exactly what I am comparing to, and not making
> it what it is not. And in most cases, I use the exact same platform
> mention the details. This makes the information much more credible,
> don't you agree?
> By the way, in the presentation you had at ISC, you did exactly the
> as what my dear friends from Qlogic did... sorry, I could not
> -----Original Message-----
> From: Scott Atchley [mailto:atchley at myri.com]
> Sent: Friday, July 20, 2007 6:21 PM
> To: Gilad Shainer
> Cc: Kevin Ball; beowulf at beowulf.org
> Subject: Re: [Beowulf] MPI2007 out - strange pop2 results?
> And you would never compare your products against our deprecated
> and five year old hardware. ;-)
> Sorry, couldn't resist. My colleagues are rolling their eyes...
> On Jul 20, 2007, at 2:55 PM, Gilad Shainer wrote:
>> Hi Kevin,
>> I believe that your company is using this list for pure marketing
>> for a long time, so don't be surprise when someone responds back.
>> If you want to put technical or performance data, and than to make
>> conclusions out of it, be sure to compare apples to apples. It is
>> use the lower performance device results of your competitor and than
>> to attack his "architecture" or his entire product line. If this is
>> not a marketing war, than I would be interesting to know what you
>> a marketing war....
>> -----Original Message-----
>> From: Kevin Ball [mailto:kevin.ball at qlogic.com]
>> Sent: Friday, July 20, 2007 11:27 AM
>> To: Gilad Shainer
>> Cc: Brian Dobbins; beowulf at beowulf.org
>> Subject: RE: [Beowulf] MPI2007 out - strange pop2 results?
>> Hi Gilad,
>> Thank you for the personal attack that came, apparently without
>> reading the email I sent. Brian asked about why the publicly
>> available, independently run MPI2007 results from HP were worse on a
>> particular than the Cambridge cluster MPI2007 results. I talked
>> three contributing factors to that. If you have other reasons you
>> want to put forward, please do so based on data, rather than engaging
>> in a blatant ad hominem attack.
>> If you want to engage in a marketing war, there are venues with
>> which to do it, but I think on the Beowulf mailing list data and
>> coherent thought are probably more appropriate.
>> On Fri, 2007-07-20 at 10:43, Gilad Shainer wrote:
>>> Dear Kevin,
>>> You continue to set world records in providing misleading
>>> You had previously compared Mellanox based products on dual
>>> single-core machines to the "InfiniPath" adapter on dual dual-core
>>> machines and claim that with InfiniPath there are more Gflops....
>>> latest release follow the same lines...
>>> Unlike QLogic InfiniPath adapters, Mellanox provide different
>>> InfiniBand HCA silicon and adapters. There are 4 different silicon
>>> chips, each with different size, different power, different price
>>> different performance. There is the PCI-X device (InfiniHost), the
>>> single-port device that was deigned for best price/performance
>>> (InfiniHost III Lx), the dual-port device that was designed for best
>>> performance (InfiniHost III Ex) and the new ConnectX device that was
>>> designed to extend the performance capabilities of the dual port
>>> device. Each device provide different price and performance points
>> (did I said different?).
>>> The SPEC results that you are using for Mellanox, are of the single
>>> port device. And even that device (that its list price is probably
>>> half of your InfiniPath) had better results with 8 server nodes
>>> Your comparison of InfiniPath to the Mellanox single-port device
>>> should have been on price/performance and not on performance.
>>> Now, if
>>> you want to really compare performance to performance, why don't you
>>> use the dual port device, or even better, ConnectX? Well... I
>>> will do
>> it for you.
>>> Every time I had compared my performance adapters to yours, your
>>> adapters did not even come close...
>>> -----Original Message-----
>>> From: beowulf-bounces at beowulf.org [mailto:beowulf-
>>> bounces at beowulf.org] On Behalf Of Kevin Ball
>>> Sent: Thursday, July 19, 2007 11:52 AM
>>> To: Brian Dobbins
>>> Cc: beowulf at beowulf.org
>>> Subject: Re: [Beowulf] MPI2007 out - strange pop2 results?
>>> Hi Brian,
>>> The benchmark 121.pop2 is based on a code that was already
>>> important to QLogic customers before the SPEC MPI2007 suite was
>>> released (POP, Parallel Ocean Program), and we have done a fair
>>> of analysis trying to understand its performance characteristics.
>>> There are three things that stand out in performance analysis on
>>> The first point is that it is a very demanding code on the
>>> There has been a fair amount of work on pop2 by the PathScale
>>> team, and the fact that the Cambridge submission used the PathScale
>>> compiler while the HP submission used the Intel compiler accounts
>>> some (the serial portion) of the advantage at small core counts,
>>> though scalability should not be affected by this.
>>> The second point is that pop2 is fairly demanding of IO. Another
>>> example to look at for this is in comparing the AMD Emerald Cluster
>>> results to the Cambridge results; the Emerald cluster is using NFS
>>> over GigE from a single server/disk, while Cambridge has a much more
>>> optimized IO subsystem. While on some results Emerald scales
>>> for pop2 it scales only from 3.71 to 15.0 (4.04X) while Cambridge
>>> scales from 4.29 to 21.0 (4.90X). The HP system appears to be using
>>> NFS over DDR IB from a single server with a RAID; thus it should
>>> somewhere between Emerald and Cambridge in this regard.
>>> The first two points account for some of the difference, but by no
>>> means all. The final one is probably the most crucial. The code
>>> uses a communication pattern consisting of many small/medium sized
>>> (between 512 bytes and 4k) point to point messages punctuated by
>>> periodic tiny (8b) allreduces. The QLogic InfiniPath architecture
>>> performs far better in this regime than the Mellanox InfiniHost
>>> This is consistent with what we have seen in other application
>>> benchmarking; even SDR Infiniband based off of the QLogic
>>> architecture performs in general as well as DDR Infiniband based on
>>> the Mellanox InfiniHost architecture, and in some cases better.
>>> Full disclosure: I work for QLogic on the InfiniPath product line.
>>> On Wed, 2007-07-18 at 18:50, Brian Dobbins wrote:
>>>> Hi guys,
>>>> Greg, thanks for the link! It will no doubt take me a little
>>>> while to parse all the MPI2007 info (even though there are only a
>>>> few submitted results at the moment!), but one of the first
>>>> things I
>>>> noticed was that performance of pop2 on the HP blade system was
>>>> atrocious... any thoughts on why this is the case? I can't see any
>>>> logical reason for the scaling they have, which (being the first
>>>> I noticed) makes me somewhat hesitant to put much stock into the
>>>> results at the moment. Perhaps this system is just a statistical
>>>> on the radar which will fade into noise when additional results are
>>>> posted, but until that time, it'd be nice to know why the results
>>>> are the way they are.
>>>> To spell it out a bit, the reference platform is at 1 (ok, 0.994)
>>>> 16 cores, but then the HP blade system at 16 cores is at 1.94. Not
>>>> bad there. However, moving up we have:
>>>> 32 cores - 2.36
>>>> 64 cores - 2.02
>>>> 128 cores - 2.14
>>>> 256 cores - 3.62
>>>> So not only does it hover at 2.x for a while, but then going from
>>>> 128 -> 256 it gets a decent relative improvement. Weird.
>>>> On the other hand, the Cambridge system (with the same processors
>>>> and a roughly similar interconnect, it seems) has the follow
>>>> from 32->256 cores:
>>>> 32 cores - 4.29
>>>> 64 cores - 7.37
>>>> 128 cores - 11.5
>>>> 256 cores - 15.4
>>>> ... So, I'm mildly confused as to the first results. Granted,
>>>> different compilers are being used, and presumably there are other
>>>> differences, too, but I can't see how -any- of them could result in
>>>> the scores the HP system got. Any thoughts? Anyone from HP (or
>>>> QLogic) care to comment? I'm not terribly knowledgeable about the
>>>> 2007 suite yet, unfortunately, so maybe I'm just overlooking
>>>> - Brian
>>>> __ _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org To change your
>>>> subscription (digest mode or unsubscribe) visit
>>> Beowulf mailing list, Beowulf at beowulf.org To change your
>>> (digest mode or unsubscribe) visit
>> Beowulf mailing list, Beowulf at beowulf.org To change your subscription
>> (digest mode or unsubscribe) visit
More information about the Beowulf