[Beowulf] MPI2007 out - strange pop2 results?
atchley at myri.com
Fri Jul 20 18:20:54 PDT 2007
And you would never compare your products against our deprecated
drivers and five year old hardware. ;-)
Sorry, couldn't resist. My colleagues are rolling their eyes...
On Jul 20, 2007, at 2:55 PM, Gilad Shainer wrote:
> Hi Kevin,
> I believe that your company is using this list for pure marketing wars
> for a long time, so don't be surprise when someone responds back.
> If you want to put technical or performance data, and than to make
> conclusions out of it, be sure to compare apples to apples. It is easy
> use the lower performance device results of your competitor and
> than to
> attack his "architecture" or his entire product line. If this is not a
> marketing war, than I would be interesting to know what you call a
> marketing war....
> -----Original Message-----
> From: Kevin Ball [mailto:kevin.ball at qlogic.com]
> Sent: Friday, July 20, 2007 11:27 AM
> To: Gilad Shainer
> Cc: Brian Dobbins; beowulf at beowulf.org
> Subject: RE: [Beowulf] MPI2007 out - strange pop2 results?
> Hi Gilad,
> Thank you for the personal attack that came, apparently without even
> reading the email I sent. Brian asked about why the publicly
> independently run MPI2007 results from HP were worse on a particular
> than the Cambridge cluster MPI2007 results. I talked about three
> contributing factors to that. If you have other reasons you want
> to put
> forward, please do so based on data, rather than engaging in a blatant
> ad hominem attack.
> If you want to engage in a marketing war, there are venues with
> to do it, but I think on the Beowulf mailing list data and coherent
> thought are probably more appropriate.
> On Fri, 2007-07-20 at 10:43, Gilad Shainer wrote:
>> Dear Kevin,
>> You continue to set world records in providing misleading
>> You had previously compared Mellanox based products on dual
>> single-core machines to the "InfiniPath" adapter on dual dual-core
>> machines and claim that with InfiniPath there are more Gflops....
>> latest release follow the same lines...
>> Unlike QLogic InfiniPath adapters, Mellanox provide different
>> InfiniBand HCA silicon and adapters. There are 4 different silicon
>> chips, each with different size, different power, different price and
>> different performance. There is the PCI-X device (InfiniHost), the
>> single-port device that was deigned for best price/performance
>> (InfiniHost III Lx), the dual-port device that was designed for best
>> performance (InfiniHost III Ex) and the new ConnectX device that was
>> designed to extend the performance capabilities of the dual port
>> device. Each device provide different price and performance points
> (did I said different?).
>> The SPEC results that you are using for Mellanox, are of the single
>> port device. And even that device (that its list price is probably
>> half of your InfiniPath) had better results with 8 server nodes than
>> Your comparison of InfiniPath to the Mellanox single-port device
>> should have been on price/performance and not on performance. Now, if
>> you want to really compare performance to performance, why don't you
>> use the dual port device, or even better, ConnectX? Well... I will do
> it for you.
>> Every time I had compared my performance adapters to yours, your
>> adapters did not even come close...
>> -----Original Message-----
>> From: beowulf-bounces at beowulf.org [mailto:beowulf-
>> bounces at beowulf.org]
>> On Behalf Of Kevin Ball
>> Sent: Thursday, July 19, 2007 11:52 AM
>> To: Brian Dobbins
>> Cc: beowulf at beowulf.org
>> Subject: Re: [Beowulf] MPI2007 out - strange pop2 results?
>> Hi Brian,
>> The benchmark 121.pop2 is based on a code that was already
>> important to QLogic customers before the SPEC MPI2007 suite was
>> released (POP, Parallel Ocean Program), and we have done a fair
>> of analysis trying to understand its performance characteristics.
>> There are three things that stand out in performance analysis on
>> The first point is that it is a very demanding code on the
>> There has been a fair amount of work on pop2 by the PathScale
>> team, and the fact that the Cambridge submission used the PathScale
>> compiler while the HP submission used the Intel compiler accounts for
>> some (the serial portion) of the advantage at small core counts,
>> though scalability should not be affected by this.
>> The second point is that pop2 is fairly demanding of IO. Another
>> example to look at for this is in comparing the AMD Emerald Cluster
>> results to the Cambridge results; the Emerald cluster is using NFS
>> over GigE from a single server/disk, while Cambridge has a much more
>> optimized IO subsystem. While on some results Emerald scales better,
>> for pop2 it scales only from 3.71 to 15.0 (4.04X) while Cambridge
>> scales from 4.29 to 21.0 (4.90X). The HP system appears to be using
>> NFS over DDR IB from a single server with a RAID; thus it should
>> somewhere between Emerald and Cambridge in this regard.
>> The first two points account for some of the difference, but by no
>> means all. The final one is probably the most crucial. The code
>> uses a communication pattern consisting of many small/medium sized
>> (between 512 bytes and 4k) point to point messages punctuated by
>> periodic tiny (8b) allreduces. The QLogic InfiniPath architecture
>> performs far better in this regime than the Mellanox InfiniHost
>> This is consistent with what we have seen in other application
>> benchmarking; even SDR Infiniband based off of the QLogic InfiniPath
>> architecture performs in general as well as DDR Infiniband based on
>> the Mellanox InfiniHost architecture, and in some cases better.
>> Full disclosure: I work for QLogic on the InfiniPath product line.
>> On Wed, 2007-07-18 at 18:50, Brian Dobbins wrote:
>>> Hi guys,
>>> Greg, thanks for the link! It will no doubt take me a little
>>> while to parse all the MPI2007 info (even though there are only a
>>> few submitted results at the moment!), but one of the first things I
>>> noticed was that performance of pop2 on the HP blade system was
>>> atrocious... any thoughts on why this is the case? I can't see any
>>> logical reason for the scaling they have, which (being the first
>>> I noticed) makes me somewhat hesitant to put much stock into the
>>> results at the moment. Perhaps this system is just a statistical
>>> on the radar which will fade into noise when additional results are
>>> posted, but until that time, it'd be nice to know why the results
>>> are the way they are.
>>> To spell it out a bit, the reference platform is at 1 (ok, 0.994)
>>> 16 cores, but then the HP blade system at 16 cores is at 1.94. Not
>>> bad there. However, moving up we have:
>>> 32 cores - 2.36
>>> 64 cores - 2.02
>>> 128 cores - 2.14
>>> 256 cores - 3.62
>>> So not only does it hover at 2.x for a while, but then going from
>>> 128 -> 256 it gets a decent relative improvement. Weird.
>>> On the other hand, the Cambridge system (with the same processors
>>> and a roughly similar interconnect, it seems) has the follow scaling
>>> from 32->256 cores:
>>> 32 cores - 4.29
>>> 64 cores - 7.37
>>> 128 cores - 11.5
>>> 256 cores - 15.4
>>> ... So, I'm mildly confused as to the first results. Granted,
>>> different compilers are being used, and presumably there are other
>>> differences, too, but I can't see how -any- of them could result in
>>> the scores the HP system got. Any thoughts? Anyone from HP (or
>>> QLogic) care to comment? I'm not terribly knowledgeable about the
>>> 2007 suite yet, unfortunately, so maybe I'm just overlooking
>>> - Brian
>>> __ _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org To change your
>>> subscription (digest mode or unsubscribe) visit
>> Beowulf mailing list, Beowulf at beowulf.org To change your subscription
>> (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf