mpich error (on 2.4.7 intel)

tekka99 at libero.it tekka99 at libero.it
Wed Aug 1 05:35:57 PDT 2001


Hello,

I have a 2.4.7 kernel machine on intel, mpich 1.2.1, pgf90.
The system is RH 7.1.
I'm trying to run a program with this command

/usr/local/mpich/bin/mpirun -np 4 hydrompi

I'm using at the moment only one smp machine with 8 cpus and
2.5Gb of RAM.

I receive this kind of error:

PGFIO/stdio: Resource temporarily unavailable
PGFIO-F-/list-directed write/unit=6/error code returned by host stdio - 
11.
File name = stdout     formatted, sequential access   record = 32
In source file main.F90, at line number 74 
rm_l_1_3349:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_2_3354:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_3_3359:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_3344:  p4_error: net_recv read:  probable EOF on socket: 1


At line 74 of main.F90 I have:

          if(mype.eq.0)then
            year=told*tnow*year_in_secs
            write(*,*)'Calculating step : ',nstep
            write(*,99)'t_i, t_f, dt : ',told,t,dt
            write(*,99)'a_i, a_f     : ',at,atnew
            write(*,99)'z_i, z_f     : ',redshiftold,redshift
            write(*,99)'Hubble const.: ',hubble
            write(*,*)'age of the universe (Myears) : ',year
            write(*,*)''        <<<<<THIS IS LINE 74>>>>>
99 format(1x,a31,3(1x,e13.7))
          endif

Can anyone give me some hints on whhere to search?

As the lines above are only prints to stdout, if I comment them, 
recompile and relaunch, I receive now:

1 - MPI_SENDRECV_REPLACE : Null communicator
[1]  Aborting program !
[1] Aborting program!
p1_3714:  p4_error: : 197
rm_l_1_3715:  p4_error: interrupt SIGINT: 2
0 - MPI_SENDRECV_REPLACE : Null communicator
[0]  Aborting program !
rm_l_3_3725:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_2_3720:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_3710:  p4_error: net_recv read:  probable EOF on socket: 1

If I try rsh or rlogin they work inside the machine.
Any suggestions?

PS: I have two NICs on the machine (only one is UP anyway). Can this be 
part of the problem? Am I using ipc or tcp?

Thanks in advance.
Bye,
Gianluca Cecchi





More information about the Beowulf mailing list