[Beowulf] Nvidia FERMI/gt300 GPU
    Bill Broadley 
    bill at cse.ucdavis.edu
       
    Thu Oct  1 14:42:32 PDT 2009
    
    
  
Craig Tierney wrote:
> Bill Broadley wrote:
>> Impressive:
>> * IEEE floating point, doubles 1/2 as fast as single precision (6 times or
>>   so faster than the gt200).
>> * ECC
> 
> The GDDR5 says it supports ECC, but what is the card going to do?
> Is it ECC just from the memory controller, or is it ECC all the way
> through the chip?  Is it 1-bit correct, 2-bit error message?
Nvidia is pleasingly specific in their white paper:
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf
Specifically:
 Fermi supports Single-Error Correct Double-Error Detect (SECDED) ECC codes
 that correct any single bit error in hardware as the data is accessed.
 ...
 Fermi’s register files, shared memories, L1 caches, L2 cache, and DRAM memory
 are ECC protected
 ...
 All NVIDIA GPUs include support for the PCI Express standard for CRC check
 with retry at the data link layer. Fermi also supports the similar GDDR5
 standard for CRC check with retry (aka “EDC”) during transmission of data
 across the memory bus.
Kudos to Nvidia to being very clear.
    
    
More information about the Beowulf
mailing list