Diagnostic tools

Donald Becker becker at scyld.com
Mon Oct 21 08:42:53 PDT 2002

On Mon, 21 Oct 2002 alvin at Maggie.Linux-Consulting.com wrote:

> On Mon, 21 Oct 2002, Manel Soria wrote:
> > We are looking for a diagnostic tool that (ideally) would
> > allow us to determine what component/s of a node fail. It should
> > test the processor, RAM, disk and network cards under heavy load
> > but in repeatable conditions.
> testing those items individually is a lot of work ...
> test process/proceedure is more important  than the actual test ??
> - many different cpu/disk/memory/nic tests
> 	http://www.Linux-1U.net/Diags/

The only Linux hardware tests you list are a CPU test (cpuburn) and many
entries for memtest86.  You missed several Linux "SMART"-based disk
diagnostics tools and the NIC diagnostics at

> > -Monitor the CPU temperature.
> use i2c-2.6.5 and lm_sensors to read the health monitors on the
> mbotherboard
> also get a regular digital thermometer from your local hw store
> for sanity checking

Good advice, since lm_sensors can only guess what type of thermal sensor
is on the motherboard.  When the guessed calibration is off, it is
usually way off, but you cannot count on that.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

More information about the Beowulf mailing list