going nutty with permission errors etc.!
Eric Linenberg
elinenbe at umich.edu
Mon Aug 6 14:35:06 PDT 2001
I am trying to run LS-DYNA on a 5 node cluster. Each node is a dual
processor and for some reason it does not work! The program is compiled for
both LAM and MPICH and I have both of these installed on my system. The
example programs for both libraries work flawlessly, and all securities are
turned off everywhere on the cluster. I am running RedHat 7.1
I can 'ssh hostname command' or 'rsh hostname command' and both work as
expected.
Every node mounts rw the ls-dyna dir, the mpich dir, the lam dir, and the
/root and /home directories by nfs at boot time.
I can't figure out wha tis wrong and I have been stuck here for a while.
here is output that the smarter ones out there may be able to figure out
something from (this is from lam_mpirun):
[guest at kitkat lstc]$ mpirun -c 4 lam_mpp960
>>>>> Process 0 <<<<<
>>>>> Signal 11 : Segmentation Violation <<<<<
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line
number 2097
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 4331 failed on node n0 with exit status 1.
-----------------------------------------------------------------------------
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line
number 2097
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line
number 2097
bash: kill: (1245) - No such pid
and here is the output from mpich_mpirun
[guest at kitkat lstc]$ mpich_mpirun -np 4 mpich_mpp960
>>>>> Process 0 <<<<<
>>>>> Signal 11 : Segmentation Violation <<<<<
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line
number 2097
rm_l_1_1257: p4_error: net_recv read: probable EOF on socket: 1
rm_l_2_1165: p4_error: net_recv read: probable EOF on socket: 1
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line
number 2097
PGFIO/stdio: Permission denied
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line
number 2097
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line
number 2097
rm_l_3_1094: p4_error: net_recv read: probable EOF on socket: 1
bm_list_4667: p4_error: net_recv read: probable EOF on socket: 1
[guest at kitkat lstc]$
----------------------
but if I am root and run the mpich_mpirun I get this:
(this seems to work okay!!!!!) but if I go above -np 5 (remember I have 10
processors here) then it seems to hang for a long while!!! UG! Any help at
all is apprecitate! Thanks, eric
[root at kitkat lstc]# mpich_mpirun -np 5 -v mpich_mpp960
running /usr/local/lstc/mpich_mpp960 on 5 LINUX ch_p4 processors
Created /usr/local/lstc/PI5843
Date: 08/06/2001 Time: 17:31:35
Executing with local workstation license
___________________________________________________
| |
| Livermore Software Technology Corporation |
| |
| 7374 Las Positas Road |
| Livermore, CA 94550 |
| Tel: (925) 449-2500 Fax: (925) 449-2507 |
| www.lstc.com |
|_________________________________________________|
| |
| LS-DYNA, A Program for Nonlinear Dynamic |
| Analysis of Structures in Three Dimensions |
| Version: 960 Date: 07/22/2001 |
| Revision: 447 Time: 14:04:43 |
| |
| Licensed to: Exponent Failure Analysis |
| |
| Platform : PC (MPICH-P4) |
| OS Level : Linux 2.12 |
| Hostname : kitkat |
| Precision : Single precision (I4R4) |
| |
| Unauthorized use infringes LSTC copyrights |
|_________________________________________________|
please define input file names or change defaults :
>
More information about the Beowulf
mailing list