[Beowulf] Intel buys QLogic InfiniBand business

Joe Landman landman at scalableinformatics.com
Fri Jan 27 18:24:10 PST 2012

On 01/27/2012 05:27 PM, Greg Lindahl wrote:

> I'm not surprised, as this 10ge adapter is aimed at the same part of
> the market that uses fibre channel, which isn't that common in HPC. It
> doesn't have the kind of TCP offload features which have been
> (futilely) marketed in HPC; it's all about running the same fibre
> channel software most enterprises have run for a long time, but having
> the network be ethernet.

That makes sense.

>> Haven't looked much at FDR or EDR latency.  Was it a huge delta (more
>> than 30%) better than QDR?  I've been hearing numbers like 0.8-0.9 us
>> for a while, and switches are still ~150-300ns port to port.
> Are you talking about the latency of 1 core on 1 system talking to 1
> core on one system, or the kind of latency that real MPI programs see,
> running on all of the cores on a system and talking to many other
> systems? I assure you that the latter is not 0.8 for any IB system.

I am looking at these things from a "best of all possible cases" 
scenario.  So when someone comes at me with new "best of all possible 
cases" numbers, I can compare.  Sadly this seems to be the state of many 

In storage, we see small disk form factor SSDs marketed generally, with 
statments like 50k IOPs, and 500 MB/s.  Though they neglect to mention 
several specific issues with these, such as writing all zeros, or the 
75k IOPs are sequential IOPs you get from taking the 600 MB/s interface, 
dividing by 8k byte operations on a sequential read.  Actually do a real 
random read and write and you get very ... very different results. 
Especially with non-zero (real) data.

>> At some
>> point I think you start hitting a latency floor, bounded in part by "c",
> Last time I did the computation, we were 10X that floor. And, of
> course, each increase in bandwidth usually makes latency worse, absent
> heroic efforts of implementers to make that headline latency look
> better.

I think thats the point though, that moving that performance "knee" down 
to lower latency involves (potentially) significant cost, for a modest 
return ... in terms of real performance benefit to a code.

Thanks for the pointer on the computation.  If we are 1000x off the 
floor, we can probably come up with a way to do better. 10x, probably 
its much harder than we think and not necessarily worth the effort.

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list