[Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Matt Allen malallen at indiana.eduWed Jan 3 08:52:40 PST 2007
- Previous message: [Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
- Next message: [Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: > personally, I'm pretty convinced that MPI implementations should stay > out of the jobstarter business, and go with straight agentless (ssh-based) > job spawning. I'm curious about your reasoning, Mark. We've had nightmare situations for years with ssh-based job spawning. The most common case is where sshd processes terminate on nodes without the child mpi processes exiting. Then we have orphaned mpi processes, owned by init, scattered throughout the cluster. If any of these processes are using limited resources (like Myrinet adapters), subsequent jobs can (more likely, will) exit immediately upon dispatch to the node. We've found ways around this with prolog/epilog scripts, and scheduling policy, but the slickest solutions so far, in my opinion, have been mpiexec (admittedly not part of an MPI implementation) and lam/openmpi. Allowing the resource manager to completely handle job spawning has provided better post-job cleanup, and more complete job statistics (cpu-time, mostly) for us. Do you not have to deal with these sorts of issues? If not, lay some wisdom on me; I could use it. Matt -- Matt Allen | Systems Analyst malallen at indiana.edu | Research and Technical Services 812-855-7318 | Indiana University
- Previous message: [Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
- Next message: [Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
