Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[torqueusers] Re: [Beowulf] job runs with mpirun on a node but not if submitted via Torque.

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Ling C. Ho ling at fnal.gov
Wed Apr 1 07:53:35 PDT 2009


Rahul Nabar wrote:

> On Tue, Mar 31, 2009 at 6:43 PM, Don Holmgren <djholm at fnal.gov> wrote:
>> Instead of logging into the node directly, you might want to try an
>> interactive
>> job (use "qsub -I") and then try your mpirun.  This may give you messages
>> that
>> for some reason aren't getting back to you in your job's .o or .e files.
> 
> I tried an interactive job; this seems the key:
> 
> forrtl: error (78): process killed (SIGTERM)
> mpirun noticed that job rank 5 with PID 10580 on node node17 exited on
> signal 11 (Segmentation fault).
> 
> I do not get this segfault when I run directly on the node but only
> when I run via Torque.
> 
> Any clues?
> 

We had a problem with resources_max.pmem accidentally set too low for the Torque queue, and the user 
login shell was getting segfault. Torque showed Exit_status of 267.

...
ling



More information about the Beowulf mailing list