[Beowulf] China to eclipse Titan with 48,000 Intel MICs?

Thu Jun 6 00:51:12 PDT 2013

On Thu, Jun 6, 2013 at 5:48 AM, Nathan Moore <ntmoore at gmail.com> wrote:
> Does anyone know,
> off-hand, how often these big machines run with all compute nodes dedicated
> to a single message-passing job?

Once, for the initial HPL run? :)

> Am I far off the mark?

No, that's indeed what happens on most supers I've seen. Even if some
codes are capable of running to the (almost) full scale of the
machine, such large runs are quite marginal. They often take place
before the system enters production, to demonstrate its capabilities,
or from time to time on a specific breakthrough project, usually along
with some media coverage. Bust most of the time, smaller jobs occupy
the slots. Those systems are mainly shared resources, and monopolizing
the whole thing for a single application/group/user is often not the
best way to make the investment profitable.

> Along similar lines, has Google or Amazon ever published their batch job
> capacity (it must be in petaflops...)

I'm not sure how it can compare. From what I understand, their systems
are not quite HPC-like, in the sense they're less tightly-coupled, and
often geographically distributed. That means high latencies and grumpy
MPI. So even if we could count the number of processors running their
servers, and multiply that by the individual Flops of said CPUs, we
would probably get a large number, but not necessarily a point of
comparison with the Top500 machines.

Cheers,
--
Kilian