[Beowulf] GPU diagnostics?
Joe Landman
landman at scalableinformatics.com
Mon Mar 30 10:10:17 PDT 2009
David Mathog wrote:
> Have any of you CUDA folks produced diagnostic programs you run during
> "burn in" of new GPU based systems, in order to weed out problem units
> before putting them into service? Minimally, something resembling
> memtest86, to be used to find buggy memory associated with the GPU?
> Optimally, it would also more directly exercise the GPU's capabilities.
>
> I asked on the NV linux forum if there were any official Nvidia graphics
> card diagnostic programs, and nobody there answered with one. This was
> originally with respect to some VDPAU issues, where it looked at first
> like there might be a hardware problem on a small set of systems,
> including mine, although in the end it turned out to be an uninitialized
> variable (it was not my code). There was no objective way to
> demonstrate for VDPAU based software that "this graphics card is
> functioning normally" to help sort this out. I figured the CUDA folks
> should have something like this, else how could you trust the results
> from the GPU calculations?
Vendors have an nVidia supplied *GEMM based burn in test. Been thinking
about a set of diagnostics end users can run as a sanity check.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list