[Beowulf] GPU's - was Westmere EX

Douglas Eadline deadline at eadline.org
Fri Apr 8 05:45:09 PDT 2011


This video may help clear things up:


have a nice weekend


> On Apr 7, 2011, at 6:25 PM, Gus Correa wrote:
>> Vincent Diepeveen wrote:
>>> GPU monster box, which is basically a few videocards inside such a
>>> box stacked up a tad, wil only add a couple of
>>> thousands.
>> This price may be OK for the videocard-class GPUs,
>> but sounds underestimated, at least for Fermi Tesla.
> Tesla (448 cores @ 1.15Ghz, 3GB ddr5) : $2.200
> note there is a 6 GB version, not aware of price will be $$$$ i bet.
> or AMD 6990 (3072 PE's @ 0.83Ghz, 4GB ddr5) : 519 euro
> 8 socket Nehalem-ex, 512GB ram DDR3, basic configuration, $205k.
> Factor 100 difference to those cards.
> A couple of thousands versus a couple of hundreds of thousands.
> Hope i made my point clear.
>> Last I checked, a NVidia S2050 pizza box with four Fermi Tesla C2050,
>> with 448 cores and 3GB RAM per GPU, cost around $10k.
>> For the beefed up version with with C2070 (6GB/GPU) it bumps to ~$15k.
>> If you care about ECC, that's the price you pay, right?
> When fermi released it was a great gpu.
> Regrettably they lobotomized the gamers card's double precision as i
> understand,
> So it hardly has double precision capabilities; if you go for nvidia
> you sure need a Tesla,
> no question about it.
> As a company i would buy in 6990's though, they're a lot cheaper and
> roughly 3x faster
> than the Nvidia's (for some more than 3x for other occassions less
> than 3x, note the card
> has 2 GPU's and 2 x 2GB == 4 GB ram on board so 2GB per gpu).
> 3072 cores @ 0.83Ghz with 50% of 'em 32 bits multiplication units for
> versus 448 cores nvidia with 448 execution units of 32 bits
> multiplication.
> Especially because multiplication has improved a lot.
> Already having written CUDA code some while ago, i wanted the cheap
> gamers card with big
> horse power now at home so  i'm toying on a 6970 now so will be able
> to report to you what is possible to
> achieve at that card with respect to prime numbers and such.
> I'm a bit amazed so little public initiatives write code for the AMD
> gpu's.
> Note that DDR5 ram doesn't have ECC by default, but has in case of
> AMD a CRC calculation
> (if i understand it correctly). It's a bit more primitive than ECC,
> but works pretty ok and shows you
> also when problems occured there, so figuring out remove what goes on
> is possible.
> Make no mistake that this isn't ECC.
> We know some HPC centers have as a hard requirement ECC, only nvidia
> is an alternative then.
> In earlier posts from some time ago and some years ago i already
> wrote on that governments should
> adapt more to how hardware develops rather than demand that hardware
> has to follow them.
> HPC has too little cash to demand that from industry.
> OpenCL i cannot advice at this moment (for a number of reasons).
> AMD-CAL and CUDA are somewhat similar. Sure there is differences, but
> majority of codes are possible
> to port quite well (there is exceptions), or easy work arounds.
> Any company doing gpgpu i would advice developing both branches of
> code at the same time,
> as that gives the company a lot of extra choices for really very
> little extra work. Maybe 1 coder,
> and it always allows you to have the fastest setup run your
> production code.
> That said we can safely expect that from raw performance coming years
> AMD will keep the leading edge
> from crunching viewpoint. Elsewhere i pointed out why.
> Even then i'd never bet at just 1 manufacturer. Go for both
> considering the cheap price of it.
> For a lot of HPC centers the choice of nvidia will be an easy one, as
> the price of the Fermi cards
> is peanuts compared to the price rest of the system and considering
> other demands that's what they'll go for.
> That might change once you stick in bunches of videocards in nodes.
> Please note that the gpu 'streamcores' or PE's whatever name you want
> to give them, are so bloody fast,
> that your code has to work within the PE's themselves and hardly use
> the RAM.
> Both for Nvidia as well as AMD, the streamcores are so fast, that you
> simply don't want to lose time on the RAM
> when your software runs, let alone that you want to use huge RAM.
> Add to that, that nvidia (have to still figure out for AMD) can in
> background stream from and to the gpu's RAM
> from the CPU, so if you do really large calculations involving many
> nodes,
> all that shouldn't be an issue in the first place.
> So if you really need 3 GB or 6 GB rather than 2 GB of RAM, that
> would really amaze me, though i'm sure
> there is cases where that happens. If we see however what was ordered
> it mostly is the 3GB Tesla's,
> at least on what has been reported, i have no global statistics on
> that...
> Now all choices are valid there, but even then we speak about peanuts
> money compared to the price of
> a single 8 socket Nehalem-ex box, which fully configured will be
> maybe $300k-$400k or something?
> Whereas a set of 4x nvidia will be probably under $15k and 4x AMD
> 6990 is 2000 euro.
> There won't be 2 gpu nvidia's any soon because of the choice they
> have historically made for the memory controllers.
> See explanation of intel fanboy David Kanter for that at
> realworldtech in a special article he wrote there.
> Please note i'm not judging AMD nor Nvidia, they have made their
> choices based upon totally different
> businessmodels i suspect and we must be happy we have this rich
> choice right now between cpu's from different
> manufacturers and gpu's from different manufacturers.
> Nvidia really seems to aim at supercomputers, giving their tesla line
> without lobotomization and lobotomizing their
> gamers cards, where AMD aims at gamers and their gamercards have full
> functionality
> without lobotomization.
> Total different businessmodels. Both have their advantages and
> disadvantages.
>  From pure performance viewpoint it's easy to see what's faster though.
> Yet right now i realize all too well that just too many still
> hesitate between also offering gpu services additional to
> cpu services, in which case having a gpu, regardless nvidia or amd,
> kicks butt of course from throughput viewpoint.
> To be really honest with you guys, i had expected that by 2011 we
> would have a gpu reaching far over 1 Teraflop double precision
> handsdown. If we see that Nvidia delivers somewhere around 515 Gflop
> and AMD has 2 gpu's on a single card to get over that Teraflop double
> precision (claim is 1.27 Teraflop double precision),
> that really is underneath my expectations from a few years ago.
> Now of course i hope you realize i'm not coding double precision code
> at all; i'm writing everything in integers of 32 bits for the AMD
> card and the Nvidia equivalent also is using 32 bits integers. The
> ideal way to do calculations on those cards, so also very big
> transforms, is using the 32 x 32 == 64 bits instructions (that's 2
> instructions in case of AMD).
> Regards,
> Vincent
>> Gus Correa
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the Beowulf mailing list