[Beowulf] Power calculations , double precision, ECC and power of APU's

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Mon Mar 18 11:17:57 PDT 2013

On Thu, Mar 14, 2013 at 5:42 AM, Vincent Diepeveen <diep at xs4all.nl> wrote:
> On Mar 12, 2013, at 5:45 AM, Mark Hahn wrote:
>> trinity a10-5700 has 384 radeon 69xx cores running at 760 MHz,
>> delivering 584 SP gflops - 65W iirc.  but only 30 GB/s for it and
>> the CPU.
>> let's compare that to a 6930 card: 1280 cores, 154 GB/s, 1920 Gflops.
>> about 1/3 the cores, flops, and something less than 1/5 the bandwidth.
>> no doubt the lower bandwidth will hurt some codes, and the lower
>> host-gpu
>> latency will help others.  I don't know whether APUs have the same
>> SP/DP ratio as comparable add-in cards.
> Since when in HPC do we want SP gflops?


I thought we learned on this list to not generalize as it can create
flame-wars?  The people in HPC who care about SP gflops are those who
understand the mathematics in their algorithms and don't want to waste
very precious memory bandwidth by unnecessarily promoting their
floating point numbers to double unless there is a good (and
measurable) reason to do so.  We use double precision, but only when
it is required to maintain numerical stability.  The result is lower
memory requirements to store model state and codes that run faster.


> Double precision rules. Let's talk about double precision and ECC.
> How many APU's have ECC?
> As for power calculations. A single box here eats 170 watt under full
> load (all cpu's under full load),
> feeding a tesla which eats practical a tad over 300 watt. So the
> total is under 500 watt.
> Now you go run 100 apu's to get to the same double precision
> crunching power. Each box eating 150 watt.
> That's 15 kilowatt for the APU's.
> Even if you would be using 10, which in single precision delivers the
> same, you stll eat 1500 kilowatt.
> You have to maintain for that 10 computers, versus just 1 Tesla.
> Now even if an apu would be able to deliver double precision and a
> lot and have ECC,
> even then you have got 10 infiniband cables from those 10 APU's, and
> you need another 250 watt for
> that and another bunch of switches of 300 watt that the Tesla
> solution doesn't need.
>>> I assume you will not build 10 nodes with 10 cpu's with integrated
>>> gpu in order to rival a
>>> single card.
>> no, as I said, the premise of my suggestion of in-package ram is
>> that it would permit glueless tiling of these chips.  the number
>> you could tile in a 1U chassis would primarily be a question of
>> power dissipation.
>> 32x 40W units would be easy.  perhaps 20 60W units.  since I'm just
>> making up numbers here, I'm going to claim that performance will be
>> twice that of trinity (a nice round 1 Tflop apiece or 20 Tflops/RU.
>> I speculate that 4x 4Gb in-package gddr5 would deliver 64 GB/s, 2GB/
>> socket - a total capacity of 40 GB/RU at 1280 GB/s.
>> compare this to a 1U server hosting 2-3 K10 cards = 4.6 Gflops and
>> 320 GB/s each.  13.8 Gflops, 960 GB/s.  similar power dissipation.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list