[Beowulf] Memory stress testing tools.
Prentice Bisbal
prentice at ias.edu
Thu Dec 9 07:59:16 PST 2010
On 12/07/2010 04:35 PM, David Mathog wrote:
>> True, but this is a multi-user system, so I don't know which user's code
>> is triggering the errors, nor do I know what usage pattern causes the
>> errors, so I'm looking for something more consistent. Well, I hope it
>> will be more consistent.
>
> Try setting up a script to take snapshots of the system every 15 seconds
> or. Something like:
>
> do while [ 1 ]
> ( date; top -b -n 1 | head -10 )>>$LOGFILE
> sleep 15
> done
>
> Then using the memory error time stamps go back through those logs to
> find the most likely culprits.
That will identify the program, but not the problem size or data set
being used that triggers the error.
Using a stress test that I control removes this detective work. I've
decided to go with mprime from the gimps project which has a stress test
feature:
http://www.mersenne.org/
--
Prentice
More information about the Beowulf
mailing list