[Beowulf] GPU's - was Westmere EX

Thu Apr 7 13:37:46 PDT 2011

Vincent Diepeveen wrote:
> 
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
> 
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> 
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 
> VERSUS
> 
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> 
> Factor 100 difference to those cards.
> 
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
> 

Not so much.

In your original message you said:

"GPU monster box, which is basically a few videocards inside such a
box stacked up a tad, wil only add a couple of thousands."

So, first it was a few GPUs on a box (whatever else the box
might have inside) for a couple of thousand (if dollars or euros
you did not specify).

Now you checked out the real prices, and said
that a *single* Fermi Tesla C2070 cost ~$2,200
(just the GPU alone, price in US dollars I suppose),
which is more like the real thing.

However, instead of admitting that your previous numbers were mistaken,
you insist that:

"Hope i made my point clear.".

Is this how you play chess?  :)
Even if your opponent is a computer, he/she/it might get
a bit discouraged.
You always win, even before the game starts.

Anyway, I don't play chess, I am no GPU expert,
I don't know about the lobotomizing of Fermi (I hope you're not talking 
about Enrico, he's dead),
and I don't think we're going anywhere with this discussion.
However, the GPU prices you sent in your original
email to the list were underestimated,
although I am afraid I may not be able to make this point go
across to you.
The prices you sent were too low,
at least when it comes to GPUs with ECC,
which is what is reliable for HPC.

Thank you,
Gus Correa

> 
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
> 
> When fermi released it was a great gpu.
> 
> Regrettably they lobotomized the gamers card's double precision as i 
> understand,
> So it hardly has double precision capabilities; if you go for nvidia you 
> sure need a Tesla,
> no question about it.
> 
> As a company i would buy in 6990's though, they're a lot cheaper and 
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less than 
> 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
> 
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for AMD
> versus 448 cores nvidia with 448 execution units of 32 bits multiplication.
> 
> Especially because multiplication has improved a lot.
> 
> Already having written CUDA code some while ago, i wanted the cheap 
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able to 
> report to you what is possible to
> achieve at that card with respect to prime numbers and such.
> 
> I'm a bit amazed so little public initiatives write code for the AMD gpu's.
> 
> Note that DDR5 ram doesn't have ECC by default, but has in case of AMD a 
> CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC, but 
> works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on is 
> possible.
> 
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia is 
> an alternative then.
> 
> In earlier posts from some time ago and some years ago i already wrote 
> on that governments should
> adapt more to how hardware develops rather than demand that hardware has 
> to follow them.
> 
> HPC has too little cash to demand that from industry.
> 
> OpenCL i cannot advice at this moment (for a number of reasons).
> 
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but 
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
> 
> Any company doing gpgpu i would advice developing both branches of code 
> at the same time,
> as that gives the company a lot of extra choices for really very little 
> extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your production 
> code.
> 
> That said we can safely expect that from raw performance coming years 
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
> 
> Even then i'd never bet at just 1 manufacturer. Go for both considering 
> the cheap price of it.
> 
> For a lot of HPC centers the choice of nvidia will be an easy one, as 
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering 
> other demands that's what they'll go for.
> 
> That might change once you stick in bunches of videocards in nodes.
> 
> Please note that the gpu 'streamcores' or PE's whatever name you want to 
> give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use the 
> RAM.
> 
> Both for Nvidia as well as AMD, the streamcores are so fast, that you 
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
> 
> Add to that, that nvidia (have to still figure out for AMD) can in 
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many nodes,
> all that shouldn't be an issue in the first place.
> 
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that would 
> really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered it 
> mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on that...
> 
> Now all choices are valid there, but even then we speak about peanuts 
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be maybe 
> $300k-$400k or something?
> 
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD 6990 
> is 2000 euro.
> 
> There won't be 2 gpu nvidia's any soon because of the choice they have 
> historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at realworldtech 
> in a special article he wrote there.
> 
> Please note i'm not judging AMD nor Nvidia, they have made their choices 
> based upon totally different
> businessmodels i suspect and we must be happy we have this rich choice 
> right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
> 
> Nvidia really seems to aim at supercomputers, giving their tesla line 
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full 
> functionality
> without lobotomization.
> 
> Total different businessmodels. Both have their advantages and 
> disadvantages.
> 
>  From pure performance viewpoint it's easy to see what's faster though.
> 
> Yet right now i realize all too well that just too many still hesitate 
> between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd, 
> kicks butt of course from throughput viewpoint.
> 
> To be really honest with you guys, i had expected that by 2011 we would 
> have a gpu reaching far over 1 Teraflop double precision handsdown. If 
> we see that Nvidia delivers somewhere around 515 Gflop and AMD has 2 
> gpu's on a single card to get over that Teraflop double precision (claim 
> is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
> 
> Now of course i hope you realize i'm not coding double precision code at 
> all; i'm writing everything in integers of 32 bits for the AMD card and 
> the Nvidia equivalent also is using 32 bits integers. The ideal way to 
> do calculations on those cards, so also very big transforms, is using 
> the 32 x 32 == 64 bits instructions (that's 2 instructions in case of AMD).
> 
> Regards,
> Vincent
> 
> 
>>
>> Gus Correa