[Beowulf] GPU's - was Westmere EX

Fri Apr 8 05:45:09 PDT 2011

All:

This video may help clear things up:

  http://www.youtube.com/watch?v=usGkq7tAhfc

have a nice weekend

--
Doug

>
> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
>
>> Vincent Diepeveen wrote:
>>
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>>>
>>
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
>
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
>
> VERSUS
>
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
>
> Factor 100 difference to those cards.
>
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
>
>
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
>
> When fermi released it was a great gpu.
>
> Regrettably they lobotomized the gamers card's double precision as i
> understand,
> So it hardly has double precision capabilities; if you go for nvidia
> you sure need a Tesla,
> no question about it.
>
> As a company i would buy in 6990's though, they're a lot cheaper and
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less
> than 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
>
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for
> AMD
> versus 448 cores nvidia with 448 execution units of 32 bits
> multiplication.
>
> Especially because multiplication has improved a lot.
>
> Already having written CUDA code some while ago, i wanted the cheap
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able
> to report to you what is possible to
> achieve at that card with respect to prime numbers and such.
>
> I'm a bit amazed so little public initiatives write code for the AMD
> gpu's.
>
> Note that DDR5 ram doesn't have ECC by default, but has in case of
> AMD a CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC,
> but works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on
> is possible.
>
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia
> is an alternative then.
>
> In earlier posts from some time ago and some years ago i already
> wrote on that governments should
> adapt more to how hardware develops rather than demand that hardware
> has to follow them.
>
> HPC has too little cash to demand that from industry.
>
> OpenCL i cannot advice at this moment (for a number of reasons).
>
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
>
> Any company doing gpgpu i would advice developing both branches of
> code at the same time,
> as that gives the company a lot of extra choices for really very
> little extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your
> production code.
>
> That said we can safely expect that from raw performance coming years
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
>
> Even then i'd never bet at just 1 manufacturer. Go for both
> considering the cheap price of it.
>
> For a lot of HPC centers the choice of nvidia will be an easy one, as
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering
> other demands that's what they'll go for.
>
> That might change once you stick in bunches of videocards in nodes.
>
> Please note that the gpu 'streamcores' or PE's whatever name you want
> to give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use
> the RAM.
>
> Both for Nvidia as well as AMD, the streamcores are so fast, that you
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
>
> Add to that, that nvidia (have to still figure out for AMD) can in
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many
> nodes,
> all that shouldn't be an issue in the first place.
>
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that
> would really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered
> it mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on
> that...
>
> Now all choices are valid there, but even then we speak about peanuts
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be
> maybe $300k-$400k or something?
>
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD
> 6990 is 2000 euro.
>
> There won't be 2 gpu nvidia's any soon because of the choice they
> have historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at
> realworldtech in a special article he wrote there.
>
> Please note i'm not judging AMD nor Nvidia, they have made their
> choices based upon totally different
> businessmodels i suspect and we must be happy we have this rich
> choice right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
>
> Nvidia really seems to aim at supercomputers, giving their tesla line
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full
> functionality
> without lobotomization.
>
> Total different businessmodels. Both have their advantages and
> disadvantages.
>
>  From pure performance viewpoint it's easy to see what's faster though.
>
> Yet right now i realize all too well that just too many still
> hesitate between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd,
> kicks butt of course from throughput viewpoint.
>
> To be really honest with you guys, i had expected that by 2011 we
> would have a gpu reaching far over 1 Teraflop double precision
> handsdown. If we see that Nvidia delivers somewhere around 515 Gflop
> and AMD has 2 gpu's on a single card to get over that Teraflop double
> precision (claim is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
>
> Now of course i hope you realize i'm not coding double precision code
> at all; i'm writing everything in integers of 32 bits for the AMD
> card and the Nvidia equivalent also is using 32 bits integers. The
> ideal way to do calculations on those cards, so also very big
> transforms, is using the 32 x 32 == 64 bits instructions (that's 2
> instructions in case of AMD).
>
> Regards,
> Vincent
>
>
>>
>> Gus Correa
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>

-- 
Doug

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.