[Beowulf] GPU diagnostics?

David Mathog mathog at caltech.edu
Mon Mar 30 09:56:02 PDT 2009

Have any of you CUDA folks produced diagnostic programs you run during
"burn in" of new GPU based systems, in order to weed out problem units
before putting them into service?  Minimally,  something resembling
memtest86, to be used to find buggy memory associated with the GPU?
Optimally, it would also more directly exercise the GPU's capabilities.

I asked on the NV linux forum if there were any official Nvidia graphics
card diagnostic programs, and nobody there answered with one.  This was
originally with respect to some VDPAU issues, where it looked at first
like there might be a hardware problem on a small set of systems,
including mine, although in the end it turned out to be an uninitialized
variable (it was not my code).   There was no objective way to
demonstrate for VDPAU based software that "this graphics card is
functioning normally" to help sort this out.  I figured the CUDA folks
should have something like this, else how could you trust the results
from the GPU calculations?


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

