[Beowulf] OpenMP wierdness on dual AMD 2350 box w/ SL5.2 x86_64
Nathan Moore
ntmoore at gmail.com
Wed Nov 26 10:32:22 PST 2008
After the help last week on openmp, I got inspired and bought a dual-quad
opteron machine for the department to show 8-way scaling for my students
("Hey, its much cheaper than something new in the optics lab", my dept.
chair laughed).
I've been working on said machine over the past few days and found something
really weird in an OpenMP example program I descrobed to the list.
The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937) mainboard,
w/ 8GB ram. Initially, I installed the i386 version of Scientific Linux
5.2, but then realized that only half of the RAM was usable, and
re-installed SL5.2 x86_64 this morning.
The example program is appended to the end of this email. Again, it is a
2-d finite-difference solution to the laplace equation (the context being to
"predict" lightning strikes based on the potential between the ground and
some clouds overhead).
The program scales beautifully up to OMP_NUM_THREADS~6 or 7, but when I set
the number of threads to 8, something weird happens, and instead of taking
the normal 241 iterations to converge, the program converges after 1 step.
This reeks of numerical instability to me, but my programming experience in
x86_64 is very limited.
I'm using gfortran, with the simple compile string,
gfortran clouds_example_OpenMP.f90 -m64 -fopenmp
Any insight into what obvious mistake I'm making would be wonderful!
The stability of the output seems erratic to me. Sometimes when
OMP_NUM_THREADS=7 the result converges normally after 241 iterations and at
other times, the result converges after 1 iteration (indicating some sort of
problem with hardware???)
This didn't occur yesterday when the machine was running SL5.2, i386.
Simulation Output:
[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=1
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
Hello World from thread 0
There are 1 threads
...
convergence criteria is \Delta V < 0.250000003725290
iterations necessary, 241
initialization time, 0
simulation time, 57
[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=2
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
Hello World from thread 0
Hello World from thread 1
There are 2 threads
...
convergence criteria is \Delta V < 0.250000003725290
iterations necessary, 241
initialization time, 0
simulation time, 28
[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=4
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
Hello World from thread 3
Hello World from thread 1
Hello World from thread 0
Hello World from thread 2
There are 4 threads
...
convergence criteria is \Delta V < 0.250000003725290
iterations necessary, 241
initialization time, 0
simulation time, 14
[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=8
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
Hello World from thread 2
...
convergence criteria is \Delta V < 0.250000003725290
iterations necessary, 1
initialization time, 0
simulation time, 0
Code listing:
nmoore at aykroyd clouds]$ cat clouds_example_OpenMP.f90
!
!
use omp_lib
!
IMPLICIT NONE
integer,parameter::Nx=2000
integer,parameter::Ny=2000
real*8 v(Nx,Ny), dv(Nx,Ny)
integer boundary(Nx,Ny)
integer i,j,converged,i1,i2
integer t0,t1,t2
real*8 convergence_v, v_cloud, v_ground, max_dv
real*8 bump_P,old_v
real*8 Lx,Ly,dx,dy,v_y
!
real*8 lightning_rod_center, lightning_rod_height
!
real*8 house_center, house_height, house_width
integer num_iterations
!
integer:: id, nthreads
!$omp parallel private(id)
id = omp_get_thread_num()
write (*,*) 'Hello World from thread', id
!$omp barrier
if ( id == 0 ) then
nthreads = omp_get_num_threads()
write (*,*) 'There are', nthreads, 'threads'
end if
!$omp end parallel
! initial time
t0 = secnds(0.0)
dx =0.1d0 ! differential lengths, m
dy =0.1d0
Lx = Nx*dx ! system sizes, m
Ly = Ny*dy
print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly
print *,"\tNx = ",Nx,"\n\tNy = ",Ny
print *,"\tdx = ",dx,"\n\tdy = ",dy
v_cloud = -10000.d0 ! volts
v_ground = 0.d0
! initialize the the boundary conditions
!
! first, set the solution function (v), to look like a
! parallel-plate capacitor
!
! note that there is one large parallel section and several
! parallel do's inside that region
!$OMP PARALLEL
!
!$OMP DO
!$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v)
!$OMP& PRIVATE(i,j)
do j=1,Ny
do i=1,Nx
boundary(i,j)=0
v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly)
end do
end do
!$OMP END DO
!
!$OMP DO
!$OMP& SHARED(Nx,Ny,boundary)
!$OMP& PRIVATE(i)
do i=1,Nx
boundary(i,1)=1 ! we need to ensure that the edges of
boundary(i,Ny)=1 ! the domain are held as boundary
end do
!$OMP END DO
!
!$OMP DO
!$OMP& SHARED(boundary,Nx)
!$OMP& PRIVATE(j)
do j=1,Ny
boundary(1,j)=1
boundary(Nx,j)=1
end do
!$OMP END DO
!$OMP END PARALLEL
! set up an interesting feature on the lower boundary
! do this in parallel with SECTIONS directive
!
!$OMP PARALLEL
!$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height)
!$OMP& PRIVATE(lightning_rod_center,house_center,house_height,house_width))
!$OMP SECTIONS
!$OMP SECTION
! Lightning_rod
lightning_rod_center = Lx*0.6d0
lightning_rod_height = 5.0d0
call
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
!$OMP SECTION
lightning_rod_center = Lx*0.5d0
call
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
!$OMP SECTION
lightning_rod_center = Lx*0.7d0
call
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
!$OMP SECTION
! house
house_center = 0.4d0*Lx
house_height = 5.0d0
house_width = 5.0d0
call
create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)
!$OMP END SECTIONS
!$OMP END PARALLEL
! initialization done
t1 = secnds(0.0)
! main solution iteration
!
! repeat the recursion relation until the maximum change
! from update to update is less than a convergence epsilon,
convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny)
print *,"\nconvergence criteria is \Delta V < ",convergence_v
num_iterations = 0
!
! iteration implemented with a goto or a do-while
converged=0
do while( converged .eq. 0)
converged = 1
num_iterations = num_iterations + 1
!$OMP PARALLEL
!$OMP DO
!$OMP& PRIVATE(i,j)
!$OMP& SHARED(Ny,Nx,dv,v,boundary))
do j=2,(Ny-1)
do i=2,(Nx-1)
dv(i,j) =
0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j)
dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j)))
end do
end do
!$OMP END DO
max_dv = 0.d0
!$OMP DO
!$OMP& PRIVATE(i,j)
!$OMP& SHARED(NX,NY,dv,v))
!$OMP& REDUCTION(MAX:max_dv)
do j=2,(Ny-1)
do i=2,(Nx-1)
v(i,j) = v(i,j) + dv(i,j)
if(dv(i,j) .gt. max_dv) then
max_dv = dv(i,j)
endif
end do
end do
!$OMP END DO
!$OMP END PARALLEL
if(max_dv .ge. convergence_v) then
converged = 0
endif
end do
! simulation finished
t2 = secnds(0.0)
print *," iterations necessary, ",num_iterations
print *," initialization time, ",t1-t0
print *," simulation time, ",t2-t1
open(unit=10,file="v_output.dat")
write(10,*) "# x\ty\tv(x,y)"
do j=1,Ny
!do i=1,Nx
! skipping the full array print to save execution time
! the printed data file is normally sent to gnuplot w/ splot
i=10
write (10,*) i*dx,j*dy,v(i,j)
!enddo
write (10,*)" "
end do
close(10)
stop
end
subroutine
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
IMPLICIT NONE
integer Nx, Ny,j,boundary(Nx,Ny)
integer j_limit
integer index_lightning_rod_center
real*8 v(Nx,Ny)
real*8 lightning_rod_center,lightning_rod_height
real*8 dx, dy, v_ground
index_lightning_rod_center = lightning_rod_center/dx
j_limit = lightning_rod_height/dy
do j=1,j_limit
v(index_lightning_rod_center,j) = v_ground
boundary(index_lightning_rod_center,j) = 1
end do
print *,"Created a lightning rod of height ",lightning_rod_height
print *,"\ty_index ",j_limit
print *,"\tx-position ",lightning_rod_center
print *,"\tx_index ",index_lightning_rod_center
end subroutine
subroutine
create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)
IMPLICIT NONE
integer Nx, Ny, boundary(Nx,Ny)
real*8 v(Nx,Ny)
real*8 v_ground, dx, dy
integer i,j,i_limit,j_limit, index_house_center
real*8 house_center,house_height,house_width
index_house_center = house_center/dx
i_limit = 0.5d0*house_width/dx
j_limit = house_height/dy
do j=1,j_limit
do i=(index_house_center-i_limit),(index_house_center+i_limit)
v(i,j) = v_ground
boundary(i,j) = 1
end do
end do
print *,"Created a house of height ",house_height
print *,"\ty_index ",j_limit
print *,"\twidth ",house_width
print *,"\thouse bounds:
",index_house_center-i_limit,index_house_center+i_limit
end subroutine
--
- - - - - - - - - - - - - - - - - - - - -
Nathan Moore
Assistant Professor, Physics
Winona State University
AIM: nmoorewsu
- - - - - - - - - - - - - - - - - - - - -
--
- - - - - - - - - - - - - - - - - - - - -
Nathan Moore
Assistant Professor, Physics
Winona State University
AIM: nmoorewsu
- - - - - - - - - - - - - - - - - - - - -
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20081126/3d2c4fa8/attachment.html
More information about the Beowulf
mailing list