[Beowulf] 3.79 TFlops sp, 0.95 TFlops dp, 264 TByte/s, 3 GByte, 198 W @ 500 EUR

Prentice Bisbal prentice at ias.edu
Thu Dec 22 08:53:39 PST 2011

Just for the record - I'm only the messenger. I noticed  a
not-insignificant number of booths touting FPGAs at SC11 this year, so I
reported on it. I also mentioned other forms of accelerators, like GPUs
and Intel's MIC architecture.

The Anton computer architecture isn't just a FPGA - it also has
custom-designed processors (ASICS). The ASICs handle the parts of the
molecular dynamics  (MD)  algorithms that are well-understood, and
unlikely to change, and the FPGAs handle the parts of the algorithms
that may change or might have room for further optimization.

As far as I know, only 8 or 9 Antons have been built. One is at the
Pittsburgh Supercomputing Center (PSC), the rest are for internal use at
DE Shaw. A single Anton consists of 512 cores, and takes up 6 or 8
racks. Despite it's small size, it's orders of magnitude faster  at
doing MD calculations than even super computers like Jaguar and
Roadrunner with hundreds of thousands of processors. So overall, Anton
is several orders of magnitudes faster than an general-purpose processor
based supercomputer. And sI'm sure it uses a LOT less power. I don't
think the Anton's are clustered together, so I'm pretty sure the
published performance on MD simulations is for a single Anton with 512

Keep in mind that Anton was designed to do only 1 thing: MD, so it
probably can't even run LinPack, and if it did, I'm sure it's score
would be awful. Also, the designers cut corners where they knew the
safely could, like using fixed-precision (or is it fixed-point?) math,
so the hardware design is only half the story in this example.


On 12/22/2011 11:27 AM, Lux, Jim (337C) wrote:
> The problem with FPGAs (and I use a fair number of them) is that you're
> never going to get the same picojoules/bit transition kind of power
> consumption that you do with a purpose designed processor.  The extra
> logic needed to get it "reconfigurable", and the physical junction sizes
> as well, make it so.
> What you will find is that on certain kinds of problems, you can implement
> a more efficient algorithm in FPGA than you can in a conventional
> processor or GPU.  So, for that class of problem, the FPGA is a winner
> (things lending themselves to fixed point systolic array type processes
> are a good candidate).
> Bear in mind also that while an FPGA may have, say, 10-million gate
> equivalent, any given practical design is going to use a small fraction of
> those gates.  Fortunately, most of those unused gates aren't toggling, so
> they don't consume clock related power, but they do consume leakage
> current, so the whole clock rate vs core voltage trade winds up a bit
> different for FPGAs.
> The biggest problem with FPGAs is that they are difficult to write high
> performance software for.  With FORTRAN on conventional and vectorized and
> pipelined processors, we've got 50 years of compiler writing expertise,
> and real high performance libraries.   And, literally millions of people
> who know how to code in FORTRAN or C or something, so if you're looking
> for the highest performance coders, even at the 4 sigma level, you've got
> a fair number to choose from.  For numerical computation in FPGAs, not so
> many. I'd guess that a large fraction of FPGA developers are doing one of
> two things: 1) digital signal processing, flow through kinds of stuff
> (error correcting codes, compression/decompression, crypto; 2) bus
> interface and data handling (PCI bus, disk drive controls, etc.).
> Interestingly, even with the relative scarcity of FPGA developers versus
> conventional CPU software, the average salaries aren't that far apart.
> The distribution on "generic coders" is wider (particularly on the low
> end.. Barriers to entry are lower for C,Java,whathaveyou code monkeys),
> but there are very, very few people making more than, say, 150-200k/yr
> doing either.  (except in a few anomalous industries, where compensation
> is higher than normal in general).  (also leaving out "equity
> participation" type deals)
> On 12/22/11 7:42 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
>> On 12/22/2011 09:57 AM, Eugen Leitl wrote:
>>> On Thu, Dec 22, 2011 at 09:43:55AM -0500, Prentice Bisbal wrote:
>>>> Or if your German is rusty:
>>>> http://www.zdnet.com/blog/computers/amd-radeon-hd-7970-graphics-card-lau
>>>> nched-benchmarked-fastest-single-gpu-board-available/7204
>>> Wonder what kind of response will be forthcoming from nVidia,
>>> given developments like
>>> http://www.theregister.co.uk/2011/11/14/arm_gpu_nvidia_supercomputer/
>>> It does seem that x86 is dead, despite good Bulldozer performance
>>> in Interlagos
>>> http://www.heise.de/newsticker/meldung/AMDs-Serverprozessoren-mit-Bulldoz
>>> er-Architektur-legen-los-1378230.html
>>> (engage dekrautizer of your choice).
>> At SC11, it was clear that everyone was looking for ways around the
>> power wall. I saw 5 or 6 different booths touting the use of FPGAs for
>> improved performance/efficiency. I don't remember there being a single
>> FPGA booth in the past. Whether the accelerator is GPU, FPGA, GRAPE,
>> Intem MIC, or something else,  I think it's clear that the future of HPC
>> architecture is going to change radically in the next couple years,
>> unless some major breakthrough occurs for commodity processors.
>> I think DE Shaw Research's Anton computer, which uses FPGAs and custom
>> processors, is an excellent example of what the future of HPC might look
>> like.
>> --
>> Prentice
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list