[Beowulf] GPU diagnostics?
Joe Landman
landman at scalableinformatics.com
Mon Mar 30 16:09:31 PDT 2009
Greg Lindahl wrote:
> On Mon, Mar 30, 2009 at 06:31:17PM -0400, Joe Landman wrote:
>
>> This said, there really isn't a memory checker for GPUs just yet. Could
>> be done, and probably should be ...
>
> But will it be like memtest86, which isn't as good as HPL at finding
> problems? If you've got DGEMM for your GPU, you're there.
Heh... I erased the paragraph where I tore into using memtest* as
anything other than a gross checker ... felt it wasn't too relevant.
We run a few parallel codes as our testers. Beats the heck out of the
system (you can hear the fans spin up on variable speed systems).
Specifically, we purposefully (computationally) overload the unit and
make sure we don't throw EDACs/MCEs.
Yeah, *GEMM is good (some GPU cards don't do DGEMMs on them though ...
older nVidia/ATI don't).
Too bad Cuda won't run on the ATIs. Would really make maintaining this
thing easy.
If people can live with SGEMMs, and other FFT-like things, we can
probably leverage (and make available) an older code we used a while
ago. Actually, for another project, we just did a DGETF and a few other
ports. Let me know if you want me to clean it up and make it available.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list