[Beowulf] OpenMP wierdness on dual AMD 2350 box w/ SL5.2 x86_64
Nathan Moore
ntmoore at gmail.com
Wed Nov 26 11:34:30 PST 2008
on the web,
http://course1.winona.edu/nmoore/clouds_example_OpenMP_f90.html
On Wed, Nov 26, 2008 at 1:22 PM, Bill Broadley <bill at cse.ucdavis.edu> wrote:
> I'd be happy to take a look, but email formatting of f90 sometimes seems to
> cause issues. Could you send it as an attachment or put it on a webpage?
>
> Nathan Moore wrote:
> > After the help last week on openmp, I got inspired and bought a dual-quad
> > opteron machine for the department to show 8-way scaling for my students
> > ("Hey, its much cheaper than something new in the optics lab", my dept.
> > chair laughed).
> >
> > I've been working on said machine over the past few days and found
> something
> > really weird in an OpenMP example program I descrobed to the list.
> >
> > The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937)
> mainboard,
> > w/ 8GB ram. Initially, I installed the i386 version of Scientific Linux
> > 5.2, but then realized that only half of the RAM was usable, and
> > re-installed SL5.2 x86_64 this morning.
> >
> > The example program is appended to the end of this email. Again, it is a
> > 2-d finite-difference solution to the laplace equation (the context being
> to
> > "predict" lightning strikes based on the potential between the ground and
> > some clouds overhead).
> >
> > The program scales beautifully up to OMP_NUM_THREADS~6 or 7, but when I
> set
> > the number of threads to 8, something weird happens, and instead of
> taking
> > the normal 241 iterations to converge, the program converges after 1
> step.
> > This reeks of numerical instability to me, but my programming experience
> in
> > x86_64 is very limited.
> >
> > I'm using gfortran, with the simple compile string,
> > gfortran clouds_example_OpenMP.f90 -m64 -fopenmp
> >
> > Any insight into what obvious mistake I'm making would be wonderful!
> >
> > The stability of the output seems erratic to me. Sometimes when
> > OMP_NUM_THREADS=7 the result converges normally after 241 iterations and
> at
> > other times, the result converges after 1 iteration (indicating some sort
> of
> > problem with hardware???)
> >
> > This didn't occur yesterday when the machine was running SL5.2, i386.
> >
> >
> >
> >
> > Simulation Output:
> >
> > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=1
> > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
> > [nmoore at aykroyd clouds]$ ./a.out
> > Hello World from thread 0
> > There are 1 threads
> > ...
> > convergence criteria is \Delta V < 0.250000003725290
> > iterations necessary, 241
> > initialization time, 0
> > simulation time, 57
> >
> >
> >
> >
> > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=2
> > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
> > [nmoore at aykroyd clouds]$ ./a.out
> > Hello World from thread 0
> > Hello World from thread 1
> > There are 2 threads
> > ...
> > convergence criteria is \Delta V < 0.250000003725290
> > iterations necessary, 241
> > initialization time, 0
> > simulation time, 28
> >
> >
> >
> >
> > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=4
> > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
> > [nmoore at aykroyd clouds]$ ./a.out
> > Hello World from thread 3
> > Hello World from thread 1
> > Hello World from thread 0
> > Hello World from thread 2
> > There are 4 threads
> > ...
> > convergence criteria is \Delta V < 0.250000003725290
> > iterations necessary, 241
> > initialization time, 0
> > simulation time, 14
> >
> >
> >
> >
> > [nmoore at aykroyd clouds]$ OMP_NUM_THREADS=8
> > [nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
> > [nmoore at aykroyd clouds]$ ./a.out
> > Hello World from thread 2
> > ...
> > convergence criteria is \Delta V < 0.250000003725290
> > iterations necessary, 1
> > initialization time, 0
> > simulation time, 0
> >
> > Code listing:
> >
> > nmoore at aykroyd clouds]$ cat clouds_example_OpenMP.f90
> > !
> > !
> > use omp_lib
> > !
> > IMPLICIT NONE
> > integer,parameter::Nx=2000
> > integer,parameter::Ny=2000
> > real*8 v(Nx,Ny), dv(Nx,Ny)
> > integer boundary(Nx,Ny)
> > integer i,j,converged,i1,i2
> > integer t0,t1,t2
> > real*8 convergence_v, v_cloud, v_ground, max_dv
> > real*8 bump_P,old_v
> > real*8 Lx,Ly,dx,dy,v_y
> > !
> > real*8 lightning_rod_center, lightning_rod_height
> > !
> > real*8 house_center, house_height, house_width
> > integer num_iterations
> > !
> > integer:: id, nthreads
> > !$omp parallel private(id)
> > id = omp_get_thread_num()
> > write (*,*) 'Hello World from thread', id
> > !$omp barrier
> > if ( id == 0 ) then
> > nthreads = omp_get_num_threads()
> > write (*,*) 'There are', nthreads, 'threads'
> > end if
> > !$omp end parallel
> >
> > ! initial time
> > t0 = secnds(0.0)
> >
> > dx =0.1d0 ! differential lengths, m
> > dy =0.1d0
> > Lx = Nx*dx ! system sizes, m
> > Ly = Ny*dy
> >
> > print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly
> > print *,"\tNx = ",Nx,"\n\tNy = ",Ny
> > print *,"\tdx = ",dx,"\n\tdy = ",dy
> >
> > v_cloud = -10000.d0 ! volts
> > v_ground = 0.d0
> >
> > ! initialize the the boundary conditions
> > !
> > ! first, set the solution function (v), to look like a
> > ! parallel-plate capacitor
> > !
> > ! note that there is one large parallel section and several
> > ! parallel do's inside that region
> > !$OMP PARALLEL
> > !
> > !$OMP DO
> > !$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v)
> > !$OMP& PRIVATE(i,j)
> > do j=1,Ny
> > do i=1,Nx
> > boundary(i,j)=0
> > v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly)
> > end do
> > end do
> > !$OMP END DO
> > !
> > !$OMP DO
> > !$OMP& SHARED(Nx,Ny,boundary)
> > !$OMP& PRIVATE(i)
> > do i=1,Nx
> > boundary(i,1)=1 ! we need to ensure that the edges of
> > boundary(i,Ny)=1 ! the domain are held as boundary
> > end do
> > !$OMP END DO
> > !
> > !$OMP DO
> > !$OMP& SHARED(boundary,Nx)
> > !$OMP& PRIVATE(j)
> > do j=1,Ny
> > boundary(1,j)=1
> > boundary(Nx,j)=1
> > end do
> > !$OMP END DO
> > !$OMP END PARALLEL
> >
> >
> > ! set up an interesting feature on the lower boundary
> > ! do this in parallel with SECTIONS directive
> > !
> > !$OMP PARALLEL
> > !$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height)
> > !$OMP&
> PRIVATE(lightning_rod_center,house_center,house_height,house_width))
> > !$OMP SECTIONS
> >
> > !$OMP SECTION
> > ! Lightning_rod
> > lightning_rod_center = Lx*0.6d0
> > lightning_rod_height = 5.0d0
> > call
> >
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
> >
> > !$OMP SECTION
> > lightning_rod_center = Lx*0.5d0
> > call
> >
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
> >
> > !$OMP SECTION
> > lightning_rod_center = Lx*0.7d0
> > call
> >
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
> >
> > !$OMP SECTION
> > ! house
> > house_center = 0.4d0*Lx
> > house_height = 5.0d0
> > house_width = 5.0d0
> > call
> >
> create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)
> >
> > !$OMP END SECTIONS
> > !$OMP END PARALLEL
> >
> > ! initialization done
> > t1 = secnds(0.0)
> >
> >
> >
> >
> >
> > ! main solution iteration
> > !
> > ! repeat the recursion relation until the maximum change
> > ! from update to update is less than a convergence epsilon,
> > convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny)
> > print *,"\nconvergence criteria is \Delta V < ",convergence_v
> > num_iterations = 0
> >
> > !
> > ! iteration implemented with a goto or a do-while
> > converged=0
> > do while( converged .eq. 0)
> >
> > converged = 1
> > num_iterations = num_iterations + 1
> > !$OMP PARALLEL
> > !$OMP DO
> > !$OMP& PRIVATE(i,j)
> > !$OMP& SHARED(Ny,Nx,dv,v,boundary))
> > do j=2,(Ny-1)
> > do i=2,(Nx-1)
> > dv(i,j) =
> > 0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j)
> > dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j)))
> > end do
> > end do
> > !$OMP END DO
> >
> > max_dv = 0.d0
> > !$OMP DO
> > !$OMP& PRIVATE(i,j)
> > !$OMP& SHARED(NX,NY,dv,v))
> > !$OMP& REDUCTION(MAX:max_dv)
> > do j=2,(Ny-1)
> > do i=2,(Nx-1)
> > v(i,j) = v(i,j) + dv(i,j)
> > if(dv(i,j) .gt. max_dv) then
> > max_dv = dv(i,j)
> > endif
> > end do
> > end do
> > !$OMP END DO
> > !$OMP END PARALLEL
> >
> > if(max_dv .ge. convergence_v) then
> > converged = 0
> > endif
> >
> > end do
> >
> >
> >
> >
> >
> >
> >
> > ! simulation finished
> > t2 = secnds(0.0)
> >
> > print *," iterations necessary, ",num_iterations
> > print *," initialization time, ",t1-t0
> > print *," simulation time, ",t2-t1
> >
> >
> > open(unit=10,file="v_output.dat")
> > write(10,*) "# x\ty\tv(x,y)"
> > do j=1,Ny
> > !do i=1,Nx
> > ! skipping the full array print to save execution time
> > ! the printed data file is normally sent to gnuplot w/ splot
> > i=10
> > write (10,*) i*dx,j*dy,v(i,j)
> > !enddo
> > write (10,*)" "
> > end do
> > close(10)
> >
> >
> > stop
> > end
> >
> >
> >
> >
> > subroutine
> >
> create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
> > IMPLICIT NONE
> > integer Nx, Ny,j,boundary(Nx,Ny)
> > integer j_limit
> > integer index_lightning_rod_center
> > real*8 v(Nx,Ny)
> > real*8 lightning_rod_center,lightning_rod_height
> > real*8 dx, dy, v_ground
> >
> > index_lightning_rod_center = lightning_rod_center/dx
> > j_limit = lightning_rod_height/dy
> > do j=1,j_limit
> > v(index_lightning_rod_center,j) = v_ground
> > boundary(index_lightning_rod_center,j) = 1
> > end do
> >
> > print *,"Created a lightning rod of height ",lightning_rod_height
> > print *,"\ty_index ",j_limit
> > print *,"\tx-position ",lightning_rod_center
> > print *,"\tx_index ",index_lightning_rod_center
> >
> >
> > end subroutine
> >
> >
> >
> >
> >
> >
> >
> > subroutine
> >
> create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)
> > IMPLICIT NONE
> > integer Nx, Ny, boundary(Nx,Ny)
> > real*8 v(Nx,Ny)
> > real*8 v_ground, dx, dy
> > integer i,j,i_limit,j_limit, index_house_center
> > real*8 house_center,house_height,house_width
> >
> > index_house_center = house_center/dx
> > i_limit = 0.5d0*house_width/dx
> > j_limit = house_height/dy
> > do j=1,j_limit
> > do i=(index_house_center-i_limit),(index_house_center+i_limit)
> > v(i,j) = v_ground
> > boundary(i,j) = 1
> > end do
> > end do
> >
> > print *,"Created a house of height ",house_height
> > print *,"\ty_index ",j_limit
> > print *,"\twidth ",house_width
> > print *,"\thouse bounds:
> > ",index_house_center-i_limit,index_house_center+i_limit
> >
> > end subroutine
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
--
- - - - - - - - - - - - - - - - - - - - -
Nathan Moore
Assistant Professor, Physics
Winona State University
AIM: nmoorewsu
- - - - - - - - - - - - - - - - - - - - -
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20081126/40611227/attachment.html>
More information about the Beowulf
mailing list