Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Diagnostic tools

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

alvin at Maggie.Linux-Consulting.com alvin at Maggie.Linux-Consulting.com
Mon Oct 21 19:50:31 PDT 2002


hi don

thanx for your link ... will check it later..
( i cant seem to get thru to your server right now

thanx
alvin


On Mon, 21 Oct 2002, Donald Becker wrote:

> On Mon, 21 Oct 2002 alvin at Maggie.Linux-Consulting.com wrote:
> 
> > On Mon, 21 Oct 2002, Manel Soria wrote:
> > 
> > > We are looking for a diagnostic tool that (ideally) would
> > > allow us to determine what component/s of a node fail. It should
> > > test the processor, RAM, disk and network cards under heavy load
> > > but in repeatable conditions.
> > 
> > testing those items individually is a lot of work ...
> > 
> > test process/proceedure is more important  than the actual test ??
> > 
> > - many different cpu/disk/memory/nic tests
> > 	http://www.Linux-1U.net/Diags/
> 
> The only Linux hardware tests you list are a CPU test (cpuburn) and many
> entries for memtest86.  You missed several Linux "SMART"-based disk
> diagnostics tools and the NIC diagnostics at
>   http://www.scyld.com/diag/index.html
> 
> > > -Monitor the CPU temperature.
> > 
> > use i2c-2.6.5 and lm_sensors to read the health monitors on the
> > mbotherboard
> > 
> > also get a regular digital thermometer from your local hw store
> > for sanity checking
> 
> Good advice, since lm_sensors can only guess what type of thermal sensor
> is on the motherboard.  When the guessed calibration is off, it is
> usually way off, but you cannot count on that.




More information about the Beowulf mailing list