[Beowulf] 18 hours, $33K, and 156,314 cores: Amazon cloud HPC hits a “petaflop”

Wed Nov 13 08:18:43 PST 2013

http://arstechnica.com/information-technology/2013/11/18-hours-33k-and-156314-cores-amazon-cloud-hpc-hits-a-petaflop/

18 hours, $33K, and 156,314 cores: Amazon cloud HPC hits a “petaflop”

1.21 petaflops? Great scott!

by Jon Brodkin - Nov 12 2013, 8:00pm WEST

One point twenty-one petaflops?!

Universal Pictures

What do you do if you need more than 150,000 CPU cores but don't have
millions of dollars to spend on a supercomputer? Go to the Amazon cloud, of
course.

AMAZON’S HPC CLOUD: SUPERCOMPUTING FOR THE 99%

Amazon's cloud can power giant HPC clusters of up to 50,000 cores.

For the past few years, HPC software company Cycle Computing has been helping
researchers harness the power of Amazon Web Services when they need serious
computing power for short bursts of time. The company has completed its
biggest Amazon cloud run yet, creating a cluster that ran for 18 hours,
hitting 156,314 cores at its largest point and a theoretical peak speed of
1.21 petaflops. (A petaflop is one quadrillion floating point operations per
second, or a million billion.)

To get all those cores, Cycle's cluster ran simultaneously in Amazon data
centers across the world, in Virginia, Oregon, Northern California, Ireland,
Singapore, Tokyo, Sydney, and São Paulo. The bill from Amazon ended up being
$33,000.

USC chemistry professor Mark Thompson needed the cluster to design materials
that might be well-suited to converting sunlight into solar energy.

"For any possible material, just figuring out how to synthesize it, purify
it, and then analyze it typically takes a year of grad student time and
hundreds of thousands of dollars in equipment, chemicals, and labor for that
one molecule," Cycle Computing CEO Jason Stowe wrote in a blog post today.

Instead of doing that, Thompson uses simulation software made by Schrödinger.
With that software running on Amazon, Thompson was able to simulate 205,000
molecules and do the equivalent of 2.3 million hours of science (counting the
compute time for each core separately). The cluster ran only last week, so
it's too early to find out what its impact on solar science will be. Still,
from a computing standpoint, it's impressive.

Enlarge / Cycle's software scaled to more than 150,000 cores in a few hours.
Cycle Computing That’s a “petaflop,” not a petaflop

While Stowe says the Amazon cluster hit 1.21 petaflops, that's the
theoretical peak speed rather than the actual performance. In the Linpack
benchmark used to test supercomputer speeds, the theoretical peak is always
reported, but the real-world results are what count when ranking the world's
fastest machines.

The Cycle cluster on Amazon would have a much lower real-world max on the
Linpack benchmark. To score high, you need machines that are physically close
to each other to reduce latency, Stowe said. Cycle's cluster was spread
around the world and did not require a blazing-fast interconnect because the
calculations could be performed independently by each virtual machine.

Supercomputing applications tend to require cores to work in concert with
each other, which is why IBM, Cray, and other companies have built incredibly
fast interconnects. Cycle's work with the Amazon cloud has focused on HPC
workloads without that requirement.

"There are whole categories of problems that are pleasantly parallel, and in
those cases the [Linpack maximum] number is really not as important because
we did intentionally make use of the entire 1.2 petaflops because they were
all concurrently executing the workloads," Stowe told Ars. "Maybe there does
need to be a different metric for analytics and big data and genomics—and all
these pleasantly parallel workloads that are becoming more pervasive."

In the most recent Top 500 list, Amazon itself used its cloud to create the
world's 127th fastest supercomputer, with 17,024 cores, a real-world max of
240.1 teraflops, and theoretical peak of 354.1 teraflops, nearly a third of
the peak number claimed by Stowe.

Building the cluster

The Cycle cluster's 156,314 cores were spread across 16,788 instances, an
average of 9.3 cores per virtual machine.

Cycle kept costs down by mostly using Amazon's auction-style spot
marketplace, buying up a variety of instance types. Cycle used a mix of
compute-optimized and general purpose instances, including Amazon's
cc2.8xlarge instance with 32 cores; the cr1.8xlarge with 32 cores; m3.xlarge
with 8 cores; and m3.2xlarge with four cores.

"To deploy this cluster, our software [CycleCloud] automated bidding,
acquiring, testing, and assembling this large environment, plus distributing
the data and the workload," Stowe wrote.

Cycle also used Opscode's Chef software as well as a new task distribution
system Cycle developed, called Jupiter. Jupiter can schedule work for massive
amounts of compute cores across regions and data centers, and this work can
continue running even when Amazon virtual machines, availability zones, or
regions fail.

"In order to reliably move workload tasks between different cloud computing
regions on AWS, we needed to build software with low overhead that would be
resilient to failure and able to scale to massive sizes," Stowe wrote. "We
needed something that supported millions of cores doing tens of millions of
tasks. Jupiter was designed to do just this."

Cycle didn't charge Thompson's research team any fee for building the cluster
beyond the $33,000 it owed Amazon, giving it a university discount. Most
other customers who want Cycle's help with cloud supercomputing will have to
pay the usual prices, though.