[Beowulf] Memory stress testing tools.

Prentice Bisbal prentice at ias.edu
Thu Dec 9 07:59:16 PST 2010


On 12/07/2010 04:35 PM, David Mathog wrote:
>> True, but this is a multi-user system, so I don't know which user's code
>> is triggering the errors, nor do I know what usage pattern causes the
>> errors, so I'm looking for something more consistent. Well, I hope it
>> will be more consistent.
>
> Try setting up a script to take snapshots of the system every 15 seconds
> or.  Something like:
>
> do while [ 1 ]
>    ( date; top -b -n 1 | head -10 )>>$LOGFILE
>    sleep 15
> done
>
> Then using the memory error time stamps go back through those logs to
> find the most likely culprits.

That will identify the program, but not the problem size or data set 
being used that triggers the error.

Using a stress test that I control removes this detective work. I've 
decided to go with mprime from the gimps project which has a stress test 
feature:

http://www.mersenne.org/

-- 
Prentice



More information about the Beowulf mailing list