[Beowulf] cloudy HPC?

Rayson Ho raysonlogin at gmail.com
Mon Feb 10 12:40:19 PST 2014


On Thu, Feb 6, 2014 at 6:36 PM, Christopher Samuel
<samuel at unimelb.edu.au> wrote:
> Glenn Lockwood has a nice in-depth post reporting results (which he
> presented at SC'13, though I missed his talk) about "High-Performance
> Virtualization: SR-IOV and InfiniBand" which sounds like it might have
> the sort of hard numbers you're after.
>
> http://glennklockwood.blogspot.com.au/2013/12/high-performance-virtualization-sr-iov_14.html

We (Scalable Logic) benchmarked AWS Enhanced Networking last year and
we made sure we setup everything correctly and double checked that
Enhanced Networking was enabled, and we got much closer to the max
bandwidth of the 10GbE.

Glenn Lockwood's EC2 results just don't look right, and I strongly
believe that he did not enable AWS Enhanced Networking. He only got
450 MB/s with the 10-gigabit Ethernet NICs. With c3.8xlarge we got
over 950MB/s with Enhanced Networking enabled and slightly over
500MB/s with standard EC2 networking:

http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Note that Glenn did not mention that he was using a VPC, which is
actually a requirement for AWS Enhanced Networking. We believe he was
just using standard EC2 networking without evening knowing. Note that
he admits "Amazon's EC2 High Performance Network Virtualization team
say they have gotten much better SR-IOV bandwidth than I
demonstrated".


** Amazon EC2 is like another platform that has learning curve, and my
coworkers and I contributed lots of changes back to the opensource MIT
StarCluster project so that others don't need to worry about all the
AWS details... For example in the latest StarCluster 0.95 release we
have all those prepackaged so that you don't need to spend the time to
learn setting up a VPC, enabling AWS Enhanced Networking, adapting for
new g2, c3, i2 instance types:

http://star.mit.edu/cluster/docs/latest/changelog.html#version-0-95
http://star.mit.edu/cluster/


We are also going to contribute the code we used to get the size &
object counts of huge S3 buckets to the boto library. So far we have
only documented our experience in "Getting Size and File Count of a 25
Million Object S3 Bucket":

http://blogs.scalablelogic.com/2014/01/getting-size-and-file-count-of-25.html


On Thu, Jan 30, 2014 at 3:57 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
> I won't describe the current sad state of Canadian HPC, except that it's hard to imagine *anything* that wouldn't be an improvement ;)

At least my former coworker Edward Walker worked in a canadian company
for 6+ years before moving to the states... back in 2008 he published
the famous "Benchmarking Amazon EC2 for High-Performance Scientific
Computing" paper:

https://www.usenix.org/legacy/publications/login/2008-10/openpdfs/walker.pdf

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html



More information about the Beowulf mailing list