MPIrun errors

Joseph E. Rose, Jr. jerosejr at cajunbro.com
Mon Jul 31 14:58:08 PDT 2000


I am getting this message any time I run the program on any nodes across the
network even after resetting the switch and nodes.  The program runs fine on
1 or 2 processors (no network involvement/ dual proc. mobo).

(master-node: root):/usr/local/mpi/bin/mpirun -v  -machinefile nodes -np 4
fdtdrfhat < man3mm.dat

running /home/kevinj/d.Fortran/d.compile/d.execute/fdtdrfhat on 4 LINUX
ch_p4 processors
Created /home/kevinj/d.Fortran/d.compile/d.execute/PI1995
rm_1_506: (0.343660) process not in process table; my_unix_id = 506
my_host=node
rm_1_506: (0.343828) Probable cause:  local slave on uniprocessor without
shared memory
rm_1_506: (0.343848) Probable fix:  ensure only one process on node
rm_1_506: (0.343863) (on master process this means 'local 0' in the
procgroup file)
rm_1_506: (0.343876) You can also remake p4 with SYSV_IPC set in the OPTIONS
file
rm_1_506: (0.343889) Alternate cause:  Using localhost as a machine name in
the progroup
rm_1_506: (0.343904) file.  The names used should match the external network
names.
rm_1_506:  p4_error: p4_get_my_id_from_proc: 0
rm_l_1_507:  p4_error: interrupt SIGINT: 2
p0_2124:  p4_error: net_recv read:  probable EOF on socket: 118457
rm_l_2_507:  p4_error: interrupt SIGINT: 2
rm_l_3_502:  p4_error: interrupt SIGINT: 2
bm_list_2125:  p4_error: interrupt SIGINT: 2
rm_506:  p4_error: interrupt SIGINT: 2
p3_501:  p4_error: interrupt SIGINT: 2
etc., etc., etc.


Thanks in advance,
Joe

Joseph E. Rose, Jr.
BRISK System Analyst
Illgen Simulation Technologies, Inc.
(210) 348-6886/ Cell: (210) 710-9060
San Antonio, Texas, U.S.A

"What you can do, or dream you can, begin it;
Boldness has genius, power, and magic in it."
                                       --Goethe






More information about the Beowulf mailing list