going nutty with permission errors etc.!

Eric Linenberg elinenbe at umich.edu
Mon Aug 6 14:35:06 PDT 2001


I am trying to run LS-DYNA on a 5 node cluster.  Each node is a dual 
processor and for some reason it does not work!  The program is compiled for 
both LAM and MPICH and I have both of these installed on my system.  The 
example programs for both libraries work flawlessly, and all securities are 
turned off everywhere on the cluster.  I am running RedHat 7.1

I can 'ssh hostname command' or 'rsh hostname command' and both work as 
expected.

Every node mounts rw the ls-dyna dir, the mpich dir, the lam dir, and the 
/root and /home directories by nfs at boot time.

I can't figure out wha tis wrong and I have been stuck here for a while.  
here is output that the smarter ones out there may be able to figure out 
something from (this is from lam_mpirun):

[guest at kitkat lstc]$ mpirun -c 4 lam_mpp960
>>>>>  Process 0  <<<<<
>>>>> Signal 11 : Segmentation Violation <<<<<
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
-----------------------------------------------------------------------------
 
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
 
PID 4331 failed on node n0 with exit status 1.
-----------------------------------------------------------------------------
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
bash: kill: (1245) - No such pid



and here is the output from mpich_mpirun

[guest at kitkat lstc]$ mpich_mpirun -np 4 mpich_mpp960
>>>>>  Process 0  <<<<<
>>>>> Signal 11 : Segmentation Violation <<<<<
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
rm_l_1_1257:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_2_1165:  p4_error: net_recv read:  probable EOF on socket: 1
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
PGFIO/stdio: Permission denied
PGFIO/stdio: Permission denied
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
PGFIO-F-/formatted write/unit=13/error code returned by host stdio - 13.
 In source file /net/ultra5/ultra5_2/jason/mpp/ls960/src/atemp.F, at line 
number 2097
rm_l_3_1094:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_4667:  p4_error: net_recv read:  probable EOF on socket: 1
 
[guest at kitkat lstc]$

----------------------

but if I am root and run the mpich_mpirun I get this:
(this seems to work okay!!!!!)  but if I go above -np 5 (remember I have 10 
processors here) then it seems to hang for a long while!!! UG!  Any help at 
all is apprecitate!  Thanks, eric



[root at kitkat lstc]# mpich_mpirun -np 5 -v  mpich_mpp960
running /usr/local/lstc/mpich_mpp960 on 5 LINUX ch_p4 processors
Created /usr/local/lstc/PI5843
      Date: 08/06/2001      Time: 17:31:35
 Executing with local workstation license
 
     ___________________________________________________
     |                                                 |
     |  Livermore  Software  Technology  Corporation   |
     |                                                 |
     |  7374 Las Positas Road                          |
     |  Livermore, CA 94550                            |
     |  Tel: (925) 449-2500  Fax: (925) 449-2507       |
     |  www.lstc.com                                   |
     |_________________________________________________|
     |                                                 |
     |  LS-DYNA, A Program for Nonlinear Dynamic       |
     |  Analysis of Structures in Three Dimensions     |
     |  Version:  960          Date: 07/22/2001        |
     |  Revision: 447          Time: 14:04:43          |
     |                                                 |
     |  Licensed to: Exponent Failure Analysis         |
     |                                                 |
     |  Platform   : PC (MPICH-P4)                     |
     |  OS Level   : Linux 2.12                        |
     |  Hostname   : kitkat                            |
     |  Precision  : Single precision (I4R4)           |
     |                                                 |
     |  Unauthorized use infringes LSTC copyrights     |
     |_________________________________________________|
 
 
  please define input file names or change defaults :
 >





More information about the Beowulf mailing list