[Beowulf] Power calculations , double precision, ECC and power of APU's

Vincent Diepeveen diep at xs4all.nl
Thu Mar 14 04:42:14 PDT 2013


On Mar 12, 2013, at 5:45 AM, Mark Hahn wrote:
> trinity a10-5700 has 384 radeon 69xx cores running at 760 MHz,  
> delivering 584 SP gflops - 65W iirc.  but only 30 GB/s for it and  
> the CPU.
>
> let's compare that to a 6930 card: 1280 cores, 154 GB/s, 1920 Gflops.
> about 1/3 the cores, flops, and something less than 1/5 the bandwidth.
> no doubt the lower bandwidth will hurt some codes, and the lower  
> host-gpu
> latency will help others.  I don't know whether APUs have the same  
> SP/DP ratio as comparable add-in cards.
>

Since when in HPC do we want SP gflops?

Double precision rules. Let's talk about double precision and ECC.

How many APU's have ECC?

As for power calculations. A single box here eats 170 watt under full  
load (all cpu's under full load),
feeding a tesla which eats practical a tad over 300 watt. So the  
total is under 500 watt.

Now you go run 100 apu's to get to the same double precision  
crunching power. Each box eating 150 watt.

That's 15 kilowatt for the APU's.

Even if you would be using 10, which in single precision delivers the  
same, you stll eat 1500 kilowatt.

You have to maintain for that 10 computers, versus just 1 Tesla.

Now even if an apu would be able to deliver double precision and a  
lot and have ECC,
even then you have got 10 infiniband cables from those 10 APU's, and  
you need another 250 watt for
that and another bunch of switches of 300 watt that the Tesla  
solution doesn't need.




>> I assume you will not build 10 nodes with 10 cpu's with integrated
>> gpu in order to rival a
>> single card.
>
> no, as I said, the premise of my suggestion of in-package ram is  
> that it would permit glueless tiling of these chips.  the number  
> you could tile in a 1U chassis would primarily be a question of  
> power dissipation.
> 32x 40W units would be easy.  perhaps 20 60W units.  since I'm just  
> making up numbers here, I'm going to claim that performance will be  
> twice that of trinity (a nice round 1 Tflop apiece or 20 Tflops/RU.
> I speculate that 4x 4Gb in-package gddr5 would deliver 64 GB/s, 2GB/ 
> socket - a total capacity of 40 GB/RU at 1280 GB/s.
>
> compare this to a 1U server hosting 2-3 K10 cards = 4.6 Gflops and  
> 320 GB/s each.  13.8 Gflops, 960 GB/s.  similar power dissipation.




More information about the Beowulf mailing list