[Beowulf] MPI and Redhat9 NFS slow down

Brian Smith brian at cypher.acomp.usf.edu
Mon Aug 23 17:54:08 PDT 2004


Hey,

> I know the NFS automount is working fine because if I start the
> job with just one processor (mpirun -np 1), I don't experience the
> slow down.

Not necessarily.  Try running 'mpirun -np 1 -nolocal' to run the process
on another node and see if the problem occurs.  You might just be
writing locally.  

How much data is being written to this NFS automount and why are you
automounting anyway?  An 8 node cluster can stay permanently mounted
without any issues (my 48 and 42 node clusters do this without any
problems).  

If these processes are writing lots of data during the run to the NFS
mount, this can substancially degrade your system performance since NFS
will not only be handling multiple requests (albiet, not that many) but
possibly a lot of data.  

Then, what type of interconnect are you using?  10/100, GigE, Myrinet,
InfiniBand?  What are the specs on the head node?

Answers to these questions would be very helpful.

--
Brian R Smith
Systems Administrator
University of South Florida
Research Computing Core Facility
Tampa, FL
http://rccf.acomp.usf.edu


On Mon, 2004-08-23 at 18:52, Jack Chen wrote:
> Hi all,
> 
> I'm not sure if this is the right place to post this question.  If it
> is not, please tell me where's the best place to get help on this,
> thanks..
> 
> We recently built a 8-node PC Linux cluster running RedHat 9 (kernel:
> 2.4.20-8smp #1 SMP).  We use this system to run EPA's CMAQ
> photochemical grid model.  I have installed the latest MPICH 1.2.6
> with Portland Group Compiler (5.2-1) using ssh.  Everything worked
> fine with the mpi example programs (cpi, pi3p etc)and 'make testing'. 
> However when I tried to run any program that write output to other nfs
> mounted drives I get very long delay.  I'm not sure where the problem
> is.  I know the NFS automount is working fine because if I start the
> job with just one processor (mpirun -np 1), I don't experience the
> slow down.
> 
> For example: 
> If I start the job on master node using 4 processors (mpirun -np 4)
> and write to the master node (master2 0),
> PIxxx file:
> master2 0 /master2/home/chenj/CMAQ_v4.3/Run/cctm/CCTM_e2a
> node103 1 /master2/home/chenj/CMAQ_v4.3/Run/cctm/CCTM_e2a
> node103 1 /master2/home/chenj/CMAQ_v4.3/Run/cctm/CCTM_e2a
> node104 1 /master2/home/chenj/CMAQ_v4.3/Run/cctm/CCTM_e2a
> 
> the run takes 168 sec
> 
> If I start the same job but write the output to any other nfs mounted
> drives besides the master node, the job will be extremely slow.  In
> this case the same job took 10962 sec.
> 
> I have tried to mount the drive using different parameters (rw,soft
> and rw,hard,bg,intr,noac) and increased the nfsd daemon from 8 to 16
> on the NSF server, but nothing change.
> 
> If you have any idea on what is going on, please help!
> 
> Any help/suggestion are greatly appreciated.
> 
> Jack
> 
>  Jack Chen
>  Laboratory for Atmospheric Research
>  Dept.of Civil & Environmental Engineering
>  Washington State University
>  Pullman, WA 99164-2910
>  509.335.5738
>  509.335.7632 (FAX)
> 
>  
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list