[Beowulf] Nvidia FERMI/gt300 GPU

Bill Broadley bill at cse.ucdavis.edu
Thu Oct 1 14:42:32 PDT 2009


Craig Tierney wrote:
> Bill Broadley wrote:
>> Impressive:
>> * IEEE floating point, doubles 1/2 as fast as single precision (6 times or
>>   so faster than the gt200).
>> * ECC
> 
> The GDDR5 says it supports ECC, but what is the card going to do?
> Is it ECC just from the memory controller, or is it ECC all the way
> through the chip?  Is it 1-bit correct, 2-bit error message?

Nvidia is pleasingly specific in their white paper:
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf

Specifically:
 Fermi supports Single-Error Correct Double-Error Detect (SECDED) ECC codes
 that correct any single bit error in hardware as the data is accessed.
 ...
 Fermi’s register files, shared memories, L1 caches, L2 cache, and DRAM memory
 are ECC protected
 ...
 All NVIDIA GPUs include support for the PCI Express standard for CRC check
 with retry at the data link layer. Fermi also supports the similar GDDR5
 standard for CRC check with retry (aka “EDC”) during transmission of data
 across the memory bus.

Kudos to Nvidia to being very clear.




More information about the Beowulf mailing list