[Beowulf] Stress / torture test cluster hardware
deadline at clustermonkey.net
Sun Oct 8 07:23:46 PDT 2006
While I would not call it torture (biting lip to not make
sarcastic remark), the BPS (Beowulf Performance Suite)
was developed to:
1) make sure things were working at various levels
2) generate baseline performance data to help measure
see what happens when things change (i.e. compilers,
You can read about it here (and download the tar ball):
One of the main components is the NASA parallel suite
that is based on real computational patters and is self
checking -- which is important when things get near the edge.
I have been improving my main driver script that allows
various compilers, MPI's, problem sizes, and numbers of nodes
to be tested.
Another real world tests is to try running Gromacs
which is known to stress the systems quite a bit.
>From the gromacs wb page:
"Gromacs is running the x86 CPUs hotter than any other program
we know of, including dedicated testing programs like cpu-burnin."
There is more here:
> Dear Beowulf mailing list members,
> we are building a new Beowulf cluster at the moment. New hardware is
> arriving every day. Now we want to make certain that this hardware has
> no errors. Therefore we want to stress test them.
> Do you know of any papers, articles, proceedings, tools... concerning
> this topic beside the ones below?
> Nico Mittenzwey
> Till now I found the following:
> "Cluster Hardware Torture Test" http://www.linuxjournal.com/article/6940
> "Stresslinux" http://www.stresslinux.org/
> "Cerberus Test Control" http://sourceforge.net/projects/va-ctcs
> "CPU Burn-In" http://users.bigpond.net.au/cpuburn/
> "mprime2414" http://www.mersenne.org/freesoft.htm
> "memtest86" http://www.memtest86.com/
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf