[Beowulf] Re: For CPMD users
Mark Kosmowski
mark.kosmowski at gmail.com
Thu Jan 31 07:40:01 PST 2008
Sangamesh:
I am by no means an expert with either clustering or CPMD, but am
learning both. I am using OpenMPI, not MPICH, but can relate some
things that I would look for.
1) First, have other CPMD parellel jobs worked correctly on the same
nodes with the same executable?
2) Does the cpmd executable work for this input file on a single
processor (i.e. not calling it as an mpich job)?
>From 1 and 2 you can determine if you have an input file issue or a
parellelization issue.
3) Does calling the "hostname" command using the same MPICH
configuration return the expected result? My cluster is three dual
Opteron machines - if they were named Node1, Node2 and Node3 and I ran
hostname using two processors on each of the three machines I would
expect to see: "Node1; Node1; Node2; Node2; Node3; Node3" where the
semi-colons are actually line breaks.
4) Can all of the nodes freely talk to one another (i.e. if using ssh,
can each node ssh correctly to every other node)?
5) Where does the cpmd output file terminate? If it can't find the
pseudopotentials, you may not be properly passing PP_LIBRARY_PATH to
the mpich call of cpmd.
Good luck,
Mark E. Kosmowski
>
> Hi CPMD users,
>
> With a CPMD parallel job, I'm getting a Segmentation Fault error.
>
> Let me explain what I did.
>
> Installed MPICH with Intel Compilers. Configure looks as follows:
>
> ./configure --prefix=/opt/MPI_LIBS/MPICH-Intel
> -cc=/opt/intel/cce/10.1.008/bin/icc
> -fc=/opt/intel/fce/10.1.008/bin/ifort --enable-f77 --with-device=ch_p4
> --with-arch=LINUX
>
>
> When I run a CPMD job with 1-4 proceses, the job is getting killed and
> gives following error:
>
> # mpirun -machinefile /export/M4 -np 4 ./cpmd.x
> /opt/APPLICATIONS/CPMD/singlemol.input > single4.out
> Killed by signal 2.
> forrtl: error (69): process interrupted (SIGINT)
> Killed by signal 2.
> Killed by signal 2.
>
> If only one process is used and without redirection, the following error
> occurred:
>
> p0_16857: p4_error: interrupt SIGSEGV: 11
>
> Can anybody explain what might be the cause for this?
>
> regards,
> Sangamesh
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://www.scyld.com/pipermail/beowulf/attachments/20080130/e93d4eec/attachment-0001.html
>
> ------------------------------
>
More information about the Beowulf
mailing list