[Beowulf] Re: python2.4 error when loose MPICH2 TI with Grid Engine

Reuti reuti at staff.uni-marburg.de
Sun Mar 2 01:45:06 PST 2008


Hi,

Am 22.02.2008 um 09:23 schrieb Sangamesh B:

> Dear Reuti & members of beowulf,
>
> I need to execute a parallel job thru grid engine.
>
> MPICH2 is installed with Process Manager:mpd.
>
> Added a parallel environment MPICH2 into SGE:
>
> $ qconf -sp MPICH2
> pe_name           MPICH2
> slots             999
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /share/apps/MPICH2/startmpi.sh -catch_rsh  
> $pe_hostfile
> stop_proc_args    /share/apps/MPICH2/stopmpi.sh
> allocation_rule   $pe_slots
> control_slaves    FALSE
> job_is_first_task TRUE
> urgency_slots     min
>
>
> Added this PE to the default queue: all.q.
>
> mpdboot is done. mpd's are running on two nodes.
>
> The script for submitting this job thru sge  is:
>
> $ cat subsamplempi.sh
> #!/bin/bash
>
> #$ -S /bin/bash
>
> #$ -cwd
>
> #$ -N Samplejob
>
> #$ -q all.q
>
> #$ -pe MPICH2 4
>
> #$ -e ERR_$JOB_NAME.$JOB_ID
>
> #$ -o OUT_$JOB_NAME.$JOB_ID
>
> date
>
> hostname
>
> /opt/MPI_LIBS/MPICH2-GNU/bin/mpirun -np $NSLOTS -machinefile  
> $TMP_DIR/machines ./samplempi
>
> echo "Executed"
>
> exit 0
>
>
> The job is getting submitted, but not executing. The error and  
> output file contain:
>
> cat ERR_Samplejob.192
> /usr/bin/env: python2.4: No such file or directory
>
> $ cat OUT_Samplejob.192
> -catch_rsh /opt/gridengine/default/spool/compute-0-0/active_jobs/ 
> 192.1/pe_hostfile
> compute-0-0
> compute-0-0
> compute-0-0
> compute-0-0
> Fri Feb 22 12:57:18 IST 2008
> compute-0-0.local
> Executed
>
> So the problem is coming for python2.4.
>
> $ which python2.4
> /opt/rocks/bin/python2.4
>
> I googled this error. Then created a symbolic link:
>
> # ln -sf /opt/rocks/bin/python2.4 /bin/python2.4
>
> After this also same error is coming.
>
> I guess the problem might be different. i.e. gridengine might not  
> getting the link to running mpd.
>
> And the procedure followed by me to configure PE might be wrong.
>
> So, I expect from you to clear my doubts and help me to resolve  
> this error.
>
> 1. Is the PE configuration of MPICH2 + grid engine right?

if you want to integrate MPICH2 with MPD it's similar to a PVM setup.  
The daemons must be started in start_proc_args on every node with a  
dedicated port number per job. You don't say what your startmpi.sh is  
doing.

> 2. Without Tight integration, is there  a way to run a MPICh2(mpd)  
> based job using gridengine?

Yes.

> 3. In smpd-daemon based and daemonless MPICH2 tight integration,  
> which one is better?

Depends: if you have just one mpirun per job which will run for days,  
I would go for the daemonless startup. But if you issue many mpirun  
calls in your jobscript which will just run for seconds I would go  
for the daemon based startup, as the mpirun will be distributed to  
the slaves faster.

> 4. Can we do mvapich2 tight integration with SGE? Any differences  
> with process managers wrt MVAPICH2?

Maybe, if the startup is similar to standard MPICH2.

-- Reuti


> Thanks & Best Regards,
> Sangamesh B

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080302/a3484f8f/attachment.html>


More information about the Beowulf mailing list