[Beowulf] GPU's - was Westmere EX

Gus Correa gus at ldeo.columbia.edu
Thu Apr 7 18:03:07 PDT 2011

Thank you for the information about AMD-CAL and the AMD GPUs.
Does AMD plan any GPU product with 64-bit and ECC,
similar to Tesla/Fermi?

The lack of a language standard may still be a hurdle here.
I guess there were old postings here about CUDA and OpenGL.
What fraction of the (non-gaming) GPU code is being written these days
in CUDA, in AMD-CAL, and in OpenCL (if any), or perhaps using
compiler directives like those in the PGI compilers?

Thank you,
Gus Correa

Vincent Diepeveen wrote:
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
>> Vincent Diepeveen wrote:
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> Factor 100 difference to those cards.
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
> When fermi released it was a great gpu.
> Regrettably they lobotomized the gamers card's double precision as i 
> understand,
> So it hardly has double precision capabilities; if you go for nvidia you 
> sure need a Tesla,
> no question about it.
> As a company i would buy in 6990's though, they're a lot cheaper and 
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less than 
> 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD
> versus 448 cores nvidia with 448 execution units of 32 bits multiplication.
> Especially because multiplication has improved a lot.
> Already having written CUDA code some while ago, i wanted the cheap 
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able to 
> report to you what is possible to
> achieve at that card with respect to prime numbers and such.
> I'm a bit amazed so little public initiatives write code for the AMD gpu's.
> Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a 
> CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC, but 
> works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on is 
> possible.
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia is 
> an alternative then.
> In earlier posts from some time ago and some years ago i already wrote 
> on that governments should
> adapt more to how hardware develops rather than demand that hardware has 
> to follow them.
> HPC has too little cash to demand that from industry.
> OpenCL i cannot advice at this moment (for a number of reasons).
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but 
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
> Any company doing gpgpu i would advice developing both branches of code 
> at the same time,
> as that gives the company a lot of extra choices for really very little 
> extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your production 
> code.
> That said we can safely expect that from raw performance coming years 
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
> Even then i'd never bet at just 1 manufacturer. Go for both considering 
> the cheap price of it.
> For a lot of HPC centers the choice of nvidia will be an easy one, as 
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering 
> other demands that's what they'll go for.
> That might change once you stick in bunches of videocards in nodes.
> Please note that the gpu 'streamcores' or PE's whatever name you want to 
> give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use the 
> RAM.
> Both for Nvidia as well as AMD, the streamcores are so fast, that you 
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
> Add to that, that nvidia (have to still figure out for AMD) can in 
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many nodes,
> all that shouldn't be an issue in the first place.
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would 
> really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered it 
> mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on that...
> Now all choices are valid there, but even then we speak about peanuts 
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be maybe 
> $300k-$400k or something?
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 
> is 2000 euro.
> There won't be 2 gpu nvidia's any soon because of the choice they have 
> historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at realworldtech 
> in a special article he wrote there.
> Please note i'm not judging AMD nor Nvidia, they have made their choices 
> based upon totally different
> businessmodels i suspect and we must be happy we have this rich choice 
> right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
> Nvidia really seems to aim at supercomputers, giving their tesla line 
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full 
> functionality
> without lobotomization.
> Total different businessmodels. Both have their advantages and 
> disadvantages.
>  From pure performance viewpoint it's easy to see what's faster though.
> Yet right now i realize all too well that just too many still hesitate 
> between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd, 
> kicks butt of course from throughput viewpoint.
> To be really honest with you guys, i had expected that by 2011 we would 
> have a gpu reaching far over 1 Teraflop double precision handsdown. If 
> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 
> gpu's on a single card to get over that Teraflop double precision (claim 
> is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
> Now of course i hope you realize i'm not coding double precision code at 
> all; i'm writing everything in integers of 32 bits for the AMD card and 
> the Nvidia equivalent also is using 32 bits integers. The ideal way to 
> do calculations on those cards, so also very big transforms, is using 
> the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD).
> Regards,
> Vincent
>> Gus Correa

More information about the Beowulf mailing list