[Beowulf] [tt] Nvidia unveils Tesla, moves into supercomputing

Thu Jun 21 15:06:33 PDT 2007

Instead of the bla bla that nvidia and ati produce, please let them create a 
few clear pdf's that describe things like
for each specific graphics card EXACTLY how BIG the caches are on each card.

How on planet earth can you program for a card without knowing how the 
caches work let alone know its size?

intel and amd definitely don't make a major secret out of the size of their 
caches.

What we see however in reviews of new graphics cards is that a few hardware 
sites simply must GUESS how big it is.

Nearly all descriptions are out of graphics programmers viewpoints instead 
out of CPU programmers viewpoints.

That makes the step real big to make a cpu intensive program work on a GPU.

Is it so hard for ATI/NVIDIA to write about their latest flagship a clear 
document like that and put it online for free download?

Additionally i miss 1 major important instruction on those GPU's, which the 
CPU's already had from 386 and on.
If your gpu just can do 32 bits integer data types, then make a parallel 
multiplication that takes 2x32 bits input and 2x32
bits output.

It is a fairy tale that FFT is faster in floating point; it just happens to 
be the case that in most SIMD there is no integer equivalent so far.

Thanks,
Vincent

----- Original Message ----- 
From: "Eugen Leitl" <eugen at leitl.org>
To: <Beowulf at beowulf.org>
Sent: Thursday, June 21, 2007 10:44 AM
Subject: [Beowulf] [tt] Nvidia unveils Tesla, moves into supercomputing

> ----- Forwarded message from Brian Atkins <brian at posthuman.com> -----
>
> From: Brian Atkins <brian at posthuman.com>
> Date: Wed, 20 Jun 2007 16:23:29 -0500
> To: transhumantech <tt at postbiota.org>
> Subject: [tt] Nvidia unveils Tesla, moves into supercomputing
> User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
>
> http://www.tgdaily.com/content/view/32557/135/
>
> Santa Clara (CA) – Nvidia today announced Tesla, a third product line next 
> to
> the GeForce and Quadro graphics products. The company aims to use Tesla 
> cards
> and the massive floating point horsepower of its graphics processors to 
> take
> over a portion of the lucrative supercomputing market.
>
> The core of each Tesla device is a GeForce 8-series GPU as well as the 
> general
> component layout of the high-end Quadro FX 5600 workstation graphics card 
> with
> 1.5 GB of memory. The only noteworthy difference between the FX 5600 and a 
> Tesla
> card is the fact that the supercomputing-targeted devices lack the 
> graphics
> outputs on the backpanel, which we were told, allows Nvidia to increase 
> the
> clock speed on Tesla.
>
> While the actual clock speed of the Tesla GeForce GPU is kept under wraps,
> Nvidia said that one processor (used in the C870 add-in card) is good for 
> a
> performance of 518 GFlops, two processors (used in the deskside 
> supercomputer
> D870, which integrates two C870 cards) will bring 1 TFlops; the Tesla GPU 
> server
> with four processors will hit 2 TFlops.
>
> In terms of pure number crunching horsepower, Nvidia told us that one 
> GeForce
> GPU can match the combined performance of 40 x86 processors. In addition 
> to the
> raw performance, Tesla also makes a case for power efficiency: The C870 is 
> rated
> at a maximum power consumption of 170 watts and the GPU server at 800 
> watts,
> which may sound a lot at first look. However, 40 low-power x86 processors 
> would
> run at a typical 1600 watts. With a common power budget of about 25 
> kilowatts
> per rackserver, a Tesla GPU server rack has a theoretical maximum 
> performance of
> more than 60 TFlops – which would put the floating point rating of such a 
> device
> among the 15 fastest supercomputers currently ranked on the Top 500
> Supercomputer list.
>
>
> Similarities to ATI’s stream processor card, implications for developers
>
> Readers, who have been following recent general purpose GPU announcements, 
> will
> remember that ATI has product in its portfolio that is very similar to the 
> Tesla
> C870 – the stream processor card (which is based on a R580 GPU and 1 GB of
> memory). Both products follow the same concept to make the massively 
> processing
> capability provided by shader processors available to run arbitrary code 
> instead
> of graphics code.
>
> Developers such as John Stone and James Philips, senior research 
> programmers at
> the Beckman Institute of Advanced Science and Technology at the University 
> of
> Illinois, have been looking at accelerators such as GPUs for some, but 
> have been
> limited mainly by bugs in shader drivers. Stone told us that much of his 
> work
> with GPUs in the past was focused “on finding driver bugs” and “writing 
> his
> applications around them” in order to make the technology usable for 
> scientific
> simulations. “There can be a lot of rounding errors and because of this 
> very
> fact, I wasn’t very excited about working with GPUs,” he said.
>
> However, both AMD and Nvidia came up with a programming model to solve 
> this
> problem. On AMD’s side, it is called CTM (“close to metal”) and on Nvidia’s 
> side
> it is CUDA (“Compute Unified Device Architecture”). At this time, it 
> appears to
> come down to personal liking which model is preferred by a developer, as, 
> for
> example, there are some universities that are working with CTM (such as
> Stanford’s Folding at Home project) and there are some that are working with 
> CUDA.
> Stone and Philips are focusing on the Nvidia model as they claim its 
> C++-based
> language model is easier to deal with than AMD’s CTM version, which uses a
> low-level assembly language.
>
> While CUDA works very much like a regular programming model and, according 
> to
> Stone, can deliver results very quickly, the big challenge in exploiting 
> these
> devices will be knowledge to write advanced parallelized code for these 
> GPGPUs.
> Stone believes that especially coders who have written code for (massively
> parallel) supercomputers before will have an easy transition opportunity. 
> Of
> course, knowledge of the hardware, graphics processing and a good look at 
> the
> parallelizable parts of applications help to take advantage of the 
> technology.
>
> Shane Ryoo, a graduate research assistant at the University of Illinois at
> Urbana-Champaign, said that CUDA will allow programmers with some 
> experience in
> developing threaded applications to get “really good results right off the 
> bat.”
> However, it will be the fine-tuning process, which will increase the value 
> of
> GPGPUs: Ryoo noted that expert knowledge that will allow developers to 
> squeeze
> the best possible performance out of GPUs, sometimes can accelerate 
> application
> code by a factor of 5x or greater.
>
> Nvidia is well aware of this challenge and has begun assisting 
> universities in
> establishing classes and developing course material focusing on massively
> parallel programming and CUDA in particular. Eventually, the company 
> hopes, that
> GPGPU programming will become a standard part in computer science course 
> work
> and help to educate a whole new generation of programmers. So far, Nvidia 
> has
> taught courses at the University of Illinois, The University of 
> California, the
> University of North Carolina and Purdue University. Nvidia said that 
> several
> universities are developing their own courses, including the University of
> Virginia, the University of Pennsylvania, Oregon State University, the
> University of Wisconsin. Caltech, MIT, Berkeley and Stanford have been 
> offering
> “legacy” GPGPU and GPU programming classes, according to Nvidia chief 
> scientist
> David Kirk.
>
> The payoff: Accelerated applications
>
>
> If the capabilities of these GPGPUs are exploited, there can be a big 
> payoff.
> Stone, who is working on Nanoscale Molecular Dynamics (NAMD) as well as 
> Visual
> Molecular Dynamics (VMD), said that a virus simulation that took 110 CPU 
> hours
> on a SGI Altix Itanium 2 supercomputer at NCSA required only 27 GPU 
> minutes on a
> GeForce 8 graphics processor – which translates into a 240x speedup.
>
> In an example that showcases an impact that can touch many lifes, Ryoo and 
> his
> team are working on an interactive, medical MRI application that 
> substantially
> increases the resolution of MRI scans thanks to the added processing 
> power. As a
> result, they expect to be able to deliver much finer images, which allow
> physicians to detect tumors at an earlier state or differentiate between a 
> blip
> or an actual tumor.
>
> In a demonstration showed during an Nvidia event, a representative from
> Headwave, a company that provides geophysical data analysis, highlighted a 
> 4D
> application, which allows users to visualize gigabytes and apparently even
> terabytes of data in a three-dimensional scale and even apply a time 
> filter to
> display changes to geological layers over time. The company claims that 
> GPUs are
> accelerating their application by about 2000% and are delivering an output 
> of
> about 2000 MB/s.
>
> In fairness, we should mention that Tesla (or stream processor cards for 
> that
> matter) will not be able to replace supercomputers, which continue to 
> provide a
> memory bandwidth a few Tesla cards cannot match. Scientists such as Stone
> believe that products such as Tesla will make their way into 
> supercomputers to
> create an overall more balanced environment. “Number crunching was the 
> limiting
> factor up until now. Now Infiniband will be a problem,” he said.
>
> GPGPUs are likely to have a greater impact on deskside supercomputers in 
> the
> short term. While scientists today have to apply for expensive 
> supercomputer
> time and in most cases have to wait several days until their application 
> can be
> processed - if those requests are not turned down anyway – there is now an
> opportunity to run many of those tests on a desk right in the lab. 
> Conceivably,
> GPGPUs will allow more scientists to run more and higher quality 
> simulations in
> less time.
>
>
> Cost and impact on the consumer
>
> Nvidia’s Tesla products will start at $1300 for the single GPU add-in 
> card; the
> 2-GPU deskside unit will run for $7500 and the 4 GPU server, which soon 
> will
> also be offered in an 8 GPU version, will sell for $12,000. Leaving out of
> consideration that, at least to our knowledge, Tesla is not yet available, 
> these
> apparently lofty price tags turn out to be bargains at a closer look.
>
> The C870 not only undercuts the ATI stream processor card, which currently 
> sells
> for about $2000, but also Nvidia’s own workstation products. The C870, at 
> $1300,
> compares to a Quadro FX 5600 graphics card, which requires and investment 
> in the
> neighborhood of $3000 and up. Clearspeed’s CSX600 accelerator card, which
> provides a performance of about 100 GFlops, is selling in volume for about 
> $7500.
>
> A representative of Evolved Machines told us that the company plans to be
> offering a 12 TFlops Tesla server, which will cost somewhere between 
> $60,000 and
> $70,000, but will be fast enough to match the floating point performance 
> of the
> 19th fastest supercomputer on the Top-500 list.
>
> Stone told us that even if the GPUs per se may appear to be expensive for 
> a
> consumer point of view, they “are available for far less money than the 
> next
> best thing that is available today.”
>
> So, what does that mean for the consumer? Clearly, there is only an 
> indirect
> benefit for most consumers that we may see in improved research results 
> down the
> road. However, as all technologies, these GPUs will get cheaper over time 
> and
> even today, a $1300 card would be in reach for enthusiasts, who often 
> spend
> substantially more than $5000 on their rig. The fact is that there is no 
> magic
> necessary to make these cards work on a PC - and CUDA even works with 
> GeForce 8
> graphics cards, which can be had for less than $250 in the case of 
> 8600-series
> models. The real question is: When will there be applications that take
> advantage of this technology and will they provide enough incentive for
> consumers to purchase a GeForce 8 card? Industry experts believe that it 
> will be
> up do developers to come up with new applications that will take advantage 
> of
> the capability of GPGPUs on the desktop.
>
> Nvidia CEO Jen-Hsun Huang told TG Daily that Tesla will be strictly 
> focused for
> the enterprise market and will not be making its way to the consumer 
> market. In
> the end, it will be up to the GeForce product groups to leverage CUDA on 
> desktop
> computers, but at least for now, Nvidia has little motivation to push this
> technology for the average consumer: “Perhaps in the future,” said Huang, 
> “[this
> technology] could do physics on the PC, but this would need a Windows 
>  API.”
>
> -- 
> Brian Atkins
> Singularity Institute for Artificial Intelligence
> http://www.singinst.org/
> _______________________________________________
> tt mailing list
> tt at postbiota.org
> http://postbiota.org/mailman/listinfo/tt
>
> ----- End forwarded message -----
> -- 
> Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
> ______________________________________________________________
> ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
> 8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>