[Beowulf] BLACS Errors?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Ashton Peters ape20 at student.canterbury.ac.nzWed Aug 4 19:14:19 PDT 2004
- Previous message: [Beowulf] Run matlab files in a cluster
- Next message: [Beowulf] VASP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I am having trouble with BLACS calls within a very simple Fortran 90 program on a ten-node dual-Opteron Rocks Linux 3.2.0 cluster. We have the PGI CDK 5.1 installed. I have written a simple Fortran program to test broadcast sends and receives using BLACS. The full code of this program is attached to the end of this message. I compile this code with: $ pgf90 -Mscalapack -o simple.opt simple.f ... and run it with: $ mpirun -np X simple.opt The code will run fine with 2 or 3 processors, with any vector length (n) I choose. Below is the screen output from a successful 3 processor run: [ape20 at colossus fwdsolvers]$ pgf90 -Mscalapack -o simple.opt simple.f [ape20 at colossus fwdsolvers]$ mpirun -np 3 simple.opt ape20 at compute-0-0's password: ape20 at compute-0-1's password: Process 0 is alive at grid position (0,0) For this test n =1000 Array sent from process 0 Process 1 is alive at grid position (0,1) Array received at process 1 Process 2 is alive at grid position (0,2) Array received at process 2 [ape20 at colossus fwdsolvers]$ However, if I try to run with -np 4 or greater, I get the following screen output: [ape20 at colossus fwdsolvers]$ pgf90 -Mscalapack -o simple.opt simple.f [ape20 at colossus fwdsolvers]$ mpirun -np 4 simple.opt ape20 at compute-0-0's password: ape20 at compute-0-1's password: ape20 at compute-0-2's password: Process 0 is alive at grid position (0,0) For this test n =1000 Array sent from process 0 Process 2 is alive at grid position (0,2) Array received at process 2 Process 1 is alive at grid position (0,1) Array received at process 1 bm_list_28551: (7.738281) wakeup_slave: unable to interrupt slave 0 pid 28550 Received disconnect from 10.255.255.252: Command terminated on signal 13. [ape20 at colossus fwdsolvers]$ rm_l_1_19376: (5.019531) net_send: could not write to fd=6, errno = 9 rm_l_1_19376: p4_error: net_send write: -1 p4_error: latest msg from perror: Bad file descriptor rm_l_2_10837: (2.453125) net_send: could not write to fd=6, errno = 9 rm_l_2_10837: p4_error: net_send write: -1 p4_error: latest msg from perror: Bad file descriptor [ape20 at colossus fwdsolvers]$ Does anyone have an idea what these error messages mean, and how I can fix them? I am a beginner with BLACS, so it is possible that my Fortran code code has not initialized it correctly, but I have checked it against many tutorial examples and it seems OK. Many thanks in advance, Ashton Peters Center for Bioengineering University of Canterbury Christchurch, New Zealand ----- FORTRAN CODE ----- program SIMPLE ccccc VERY SIMPLE BLACS TEST PROGRAM ccccc ccccc Declare variables integer iam,nprocs,nprows,npcols,ctxt,myprow,mypcol integer junk(5000) ccccc Total number of processes call BLACS_PINFO(iam,nprocs) ccccc Define size of process grid (in this case a single row) nprows=1 npcols=nprocs ccccc Get the system context call BLACS_GET(0,0,ctxt) ccccc Initialise the process grid call BLACS_GRIDINIT(ctxt,'Row',nprows,npcols) call BLACS_GRIDINFO(ctxt,nprows,npcols,myprow,mypcol) ccccc Get each process to check in with grid coordinates 10 format(a8,i2,a28,i1,a1,i1,a1) print 10,'Process',iam, & 'is alive at grid position (',myprow,',',mypcol,')' ccccc Master generates integer array and broadcasts to all slaves if((myprow.eq.0).and.(mypcol.eq.0)) then n=1000 call IGEBS2D(ctxt,'All',' ',1,1,n,1) 20 format(a18,i4) print 20,'For this test n =',n do i=1,n junk(i)=i enddo call IGEBS2D(ctxt,'All',' ',n,1,junk,5000) print 30,'Array sent from process ',iam ccccc End master code endif ccccc Slaves receive info and check it is correct if((myprow.ne.0).or.(mypcol.ne.0)) then call IGEBR2D(ctxt,'All',' ',1,1,n,1,0,0) call IGEBR2D(ctxt,'All',' ',n,1,junk,2500,0,0) 30 format(a27,i2) if((junk(1).eq.1).and.(junk(n).eq.n)) then print 30,'Array received at process ',iam else print 30,'Error receiving at process',iam endif ccccc End slave code endif ccccc End program end ----- END OF FORTRAN CODE -----
- Previous message: [Beowulf] Run matlab files in a cluster
- Next message: [Beowulf] VASP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
