[Beowulf] Re: python2.4 error when loose MPICH2 TI with Grid Engine
Reuti
reuti at staff.uni-marburg.de
Sun Mar 2 01:45:06 PST 2008
Hi,
Am 22.02.2008 um 09:23 schrieb Sangamesh B:
> Dear Reuti & members of beowulf,
>
> I need to execute a parallel job thru grid engine.
>
> MPICH2 is installed with Process Manager:mpd.
>
> Added a parallel environment MPICH2 into SGE:
>
> $ qconf -sp MPICH2
> pe_name MPICH2
> slots 999
> user_lists NONE
> xuser_lists NONE
> start_proc_args /share/apps/MPICH2/startmpi.sh -catch_rsh
> $pe_hostfile
> stop_proc_args /share/apps/MPICH2/stopmpi.sh
> allocation_rule $pe_slots
> control_slaves FALSE
> job_is_first_task TRUE
> urgency_slots min
>
>
> Added this PE to the default queue: all.q.
>
> mpdboot is done. mpd's are running on two nodes.
>
> The script for submitting this job thru sge is:
>
> $ cat subsamplempi.sh
> #!/bin/bash
>
> #$ -S /bin/bash
>
> #$ -cwd
>
> #$ -N Samplejob
>
> #$ -q all.q
>
> #$ -pe MPICH2 4
>
> #$ -e ERR_$JOB_NAME.$JOB_ID
>
> #$ -o OUT_$JOB_NAME.$JOB_ID
>
> date
>
> hostname
>
> /opt/MPI_LIBS/MPICH2-GNU/bin/mpirun -np $NSLOTS -machinefile
> $TMP_DIR/machines ./samplempi
>
> echo "Executed"
>
> exit 0
>
>
> The job is getting submitted, but not executing. The error and
> output file contain:
>
> cat ERR_Samplejob.192
> /usr/bin/env: python2.4: No such file or directory
>
> $ cat OUT_Samplejob.192
> -catch_rsh /opt/gridengine/default/spool/compute-0-0/active_jobs/
> 192.1/pe_hostfile
> compute-0-0
> compute-0-0
> compute-0-0
> compute-0-0
> Fri Feb 22 12:57:18 IST 2008
> compute-0-0.local
> Executed
>
> So the problem is coming for python2.4.
>
> $ which python2.4
> /opt/rocks/bin/python2.4
>
> I googled this error. Then created a symbolic link:
>
> # ln -sf /opt/rocks/bin/python2.4 /bin/python2.4
>
> After this also same error is coming.
>
> I guess the problem might be different. i.e. gridengine might not
> getting the link to running mpd.
>
> And the procedure followed by me to configure PE might be wrong.
>
> So, I expect from you to clear my doubts and help me to resolve
> this error.
>
> 1. Is the PE configuration of MPICH2 + grid engine right?
if you want to integrate MPICH2 with MPD it's similar to a PVM setup.
The daemons must be started in start_proc_args on every node with a
dedicated port number per job. You don't say what your startmpi.sh is
doing.
> 2. Without Tight integration, is there a way to run a MPICh2(mpd)
> based job using gridengine?
Yes.
> 3. In smpd-daemon based and daemonless MPICH2 tight integration,
> which one is better?
Depends: if you have just one mpirun per job which will run for days,
I would go for the daemonless startup. But if you issue many mpirun
calls in your jobscript which will just run for seconds I would go
for the daemon based startup, as the mpirun will be distributed to
the slaves faster.
> 4. Can we do mvapich2 tight integration with SGE? Any differences
> with process managers wrt MVAPICH2?
Maybe, if the startup is similar to standard MPICH2.
-- Reuti
> Thanks & Best Regards,
> Sangamesh B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080302/a3484f8f/attachment.html>
More information about the Beowulf
mailing list