[Beowulf] 3.79 TFlops sp, 0.95 TFlops dp, 264 TByte/s, 3 GByte, 198 W @ 500 EUR

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Thu Dec 22 09:33:37 PST 2011


That's an interesting approach of combining ASICs with FPGAs. ASICs will
blow the doors off anything else in a FLOP/Joule contest or a FLOPS/kg or
FLOPS/dollar.. For tasks for which the ASIC is designed.  FPGAs to handle
the routing/sequencing/variable parts of the problem and ASICs to do the
crunching is a great idea.  Sort of the same idea as including DSP or
PowerPC cores on a Xilinx FPGA, at a more macro scale.
(and of interest in the HPC world, since early 2nd generation Hypercubes
from Intel used Xilinx FPGAs as their routing fabric)

The challenge with this kind of hardware design is PWB design. Sure, you
have 1100+ pins coming out of that FPGA.. Now you have to route them
somewhere. And do it in a manufacturable board: I've worked recently with
a board that had 22 layers, and we were at the ragged edge of tolerances
with the close pitch column grid array parts we had to use.

I would expect the clever folks at DE Shaw did an integrated design with
their ASIC.. Make the ASIC pinouts such that they line up with the FPGAs,
and make the routing problem simpler.




On 12/22/11 8:53 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:

>Just for the record - I'm only the messenger. I noticed  a
>not-insignificant number of booths touting FPGAs at SC11 this year, so I
>reported on it. I also mentioned other forms of accelerators, like GPUs
>and Intel's MIC architecture.
>
>The Anton computer architecture isn't just a FPGA - it also has
>custom-designed processors (ASICS). The ASICs handle the parts of the
>molecular dynamics  (MD)  algorithms that are well-understood, and
>unlikely to change, and the FPGAs handle the parts of the algorithms
>that may change or might have room for further optimization.
>
>As far as I know, only 8 or 9 Antons have been built. One is at the
>Pittsburgh Supercomputing Center (PSC), the rest are for internal use at
>DE Shaw. A single Anton consists of 512 cores, and takes up 6 or 8
>racks. Despite it's small size, it's orders of magnitude faster  at
>doing MD calculations than even super computers like Jaguar and
>Roadrunner with hundreds of thousands of processors. So overall, Anton
>is several orders of magnitudes faster than an general-purpose processor
>based supercomputer. And sI'm sure it uses a LOT less power. I don't
>think the Anton's are clustered together, so I'm pretty sure the
>published performance on MD simulations is for a single Anton with 512
>cores
>
>Keep in mind that Anton was designed to do only 1 thing: MD, so it
>probably can't even run LinPack, and if it did, I'm sure it's score
>would be awful. Also, the designers cut corners where they knew the
>safely could, like using fixed-precision (or is it fixed-point?) math,
>so the hardware design is only half the story in this example.
>
>Prentice
>
>
>
>On 12/22/2011 11:27 AM, Lux, Jim (337C) wrote:
>> The problem with FPGAs (and I use a fair number of them) is that you're
>> never going to get the same picojoules/bit transition kind of power
>> consumption that you do with a purpose designed processor.  The extra
>> logic needed to get it "reconfigurable", and the physical junction sizes
>> as well, make it so.
>>
>> What you will find is that on certain kinds of problems, you can
>>implement
>> a more efficient algorithm in FPGA than you can in a conventional
>> processor or GPU.  So, for that class of problem, the FPGA is a winner
>> (things lending themselves to fixed point systolic array type processes
>> are a good candidate).
>>
>> Bear in mind also that while an FPGA may have, say, 10-million gate
>> equivalent, any given practical design is going to use a small fraction
>>of
>> those gates.  Fortunately, most of those unused gates aren't toggling,
>>so
>> they don't consume clock related power, but they do consume leakage
>> current, so the whole clock rate vs core voltage trade winds up a bit
>> different for FPGAs.
>>
>> The biggest problem with FPGAs is that they are difficult to write high
>> performance software for.  With FORTRAN on conventional and vectorized
>>and
>> pipelined processors, we've got 50 years of compiler writing expertise,
>> and real high performance libraries.   And, literally millions of people
>> who know how to code in FORTRAN or C or something, so if you're looking
>> for the highest performance coders, even at the 4 sigma level, you've
>>got
>> a fair number to choose from.  For numerical computation in FPGAs, not
>>so
>> many. I'd guess that a large fraction of FPGA developers are doing one
>>of
>> two things: 1) digital signal processing, flow through kinds of stuff
>> (error correcting codes, compression/decompression, crypto; 2) bus
>> interface and data handling (PCI bus, disk drive controls, etc.).
>>
>> Interestingly, even with the relative scarcity of FPGA developers versus
>> conventional CPU software, the average salaries aren't that far apart.
>> The distribution on "generic coders" is wider (particularly on the low
>> end.. Barriers to entry are lower for C,Java,whathaveyou code monkeys),
>> but there are very, very few people making more than, say, 150-200k/yr
>> doing either.  (except in a few anomalous industries, where compensation
>> is higher than normal in general).  (also leaving out "equity
>> participation" type deals)
>>
>>
>>
>> On 12/22/11 7:42 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
>>
>>> On 12/22/2011 09:57 AM, Eugen Leitl wrote:
>>>> On Thu, Dec 22, 2011 at 09:43:55AM -0500, Prentice Bisbal wrote:
>>>>
>>>>> Or if your German is rusty:
>>>>>
>>>>>
>>>>> 
>>>>>http://www.zdnet.com/blog/computers/amd-radeon-hd-7970-graphics-card-l
>>>>>au
>>>>> nched-benchmarked-fastest-single-gpu-board-available/7204
>>>> Wonder what kind of response will be forthcoming from nVidia,
>>>> given developments like
>>>> http://www.theregister.co.uk/2011/11/14/arm_gpu_nvidia_supercomputer/
>>>>
>>>> It does seem that x86 is dead, despite good Bulldozer performance
>>>> in Interlagos
>>>>
>>>>
>>>> 
>>>>http://www.heise.de/newsticker/meldung/AMDs-Serverprozessoren-mit-Bulld
>>>>oz
>>>> er-Architektur-legen-los-1378230.html
>>>>
>>>> (engage dekrautizer of your choice).
>>>>
>>> At SC11, it was clear that everyone was looking for ways around the
>>> power wall. I saw 5 or 6 different booths touting the use of FPGAs for
>>> improved performance/efficiency. I don't remember there being a single
>>> FPGA booth in the past. Whether the accelerator is GPU, FPGA, GRAPE,
>>> Intem MIC, or something else,  I think it's clear that the future of
>>>HPC
>>> architecture is going to change radically in the next couple years,
>>> unless some major breakthrough occurs for commodity processors.
>>>
>>> I think DE Shaw Research's Anton computer, which uses FPGAs and custom
>>> processors, is an excellent example of what the future of HPC might
>>>look
>>> like.
>>>
>>> --
>>> Prentice
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list