[Beowulf] Nvidia FERMI/gt300 GPU
Bill Broadley
bill at cse.ucdavis.edu
Thu Oct 1 14:42:32 PDT 2009
Craig Tierney wrote:
> Bill Broadley wrote:
>> Impressive:
>> * IEEE floating point, doubles 1/2 as fast as single precision (6 times or
>> so faster than the gt200).
>> * ECC
>
> The GDDR5 says it supports ECC, but what is the card going to do?
> Is it ECC just from the memory controller, or is it ECC all the way
> through the chip? Is it 1-bit correct, 2-bit error message?
Nvidia is pleasingly specific in their white paper:
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf
Specifically:
Fermi supports Single-Error Correct Double-Error Detect (SECDED) ECC codes
that correct any single bit error in hardware as the data is accessed.
...
Fermi’s register files, shared memories, L1 caches, L2 cache, and DRAM memory
are ECC protected
...
All NVIDIA GPUs include support for the PCI Express standard for CRC check
with retry at the data link layer. Fermi also supports the similar GDDR5
standard for CRC check with retry (aka “EDC”) during transmission of data
across the memory bus.
Kudos to Nvidia to being very clear.
More information about the Beowulf
mailing list