[torqueusers] Re: [Beowulf] job runs with mpirun on a node but not if submitted via Torque.

Ling C. Ho ling at fnal.gov
Wed Apr 1 07:53:35 PDT 2009


Rahul Nabar wrote:

> On Tue, Mar 31, 2009 at 6:43 PM, Don Holmgren <djholm at fnal.gov> wrote:
>> Instead of logging into the node directly, you might want to try an
>> interactive
>> job (use "qsub -I") and then try your mpirun.  This may give you messages
>> that
>> for some reason aren't getting back to you in your job's .o or .e files.
> 
> I tried an interactive job; this seems the key:
> 
> forrtl: error (78): process killed (SIGTERM)
> mpirun noticed that job rank 5 with PID 10580 on node node17 exited on
> signal 11 (Segmentation fault).
> 
> I do not get this segfault when I run directly on the node but only
> when I run via Torque.
> 
> Any clues?
> 

We had a problem with resources_max.pmem accidentally set too low for the Torque queue, and the user 
login shell was getting segfault. Torque showed Exit_status of 267.

...
ling



More information about the Beowulf mailing list