[Beowulf] OpenMP on AMD dual core processors

Thu Nov 20 20:55:24 PST 2008

Nathan Moore wrote:

> Any suggestions?  I figured that this would be a simple example to 
> parallelize.  Is there a better example for OpenMP parallelization?  
> Also, is there something obvious I'm missing in the example below? 

A few thoughts ...

Initialize your data in parallel as well.  No reason not to.  But 
optimize that code a bit.  You don't need

	v_y = v_ground + (v_cloud-v_ground)*(j*dy/Ly)
        	boundary(i,j)=0
       	v(i,j) = v_y

when

	v(i,j)=  v_ground + (v_cloud-v_ground)*(j*dy/Ly)
        	boundary(i,j)=0

will eliminate the explicit temporary variable.  Also the i.eq.0 test is 
guaranteed never to be hit in the if-then construct, as with the j.eq.0.

You can (and should) replace that if-then construct with a set of loops 
of the form

	do j=1,Ny
	 boundary(Nx,j) = 1
	end do      	
	do i=1,Nx
	 boundary(i,Ny) = 1
	end do

Also, what sticks out to me is that old_v may be viewed as "shared" 
versus "private".  I know OpenMP is supposed to do the right thing here, 
  but you might need to explicitly mark old_v as private.  And dv for 
that matter.

Note also that this inner loop is attempting to do a convergence test. 
You are looking to set a globally shared value from within an inner 
loop.  This is not a good thing to do.  This means accesses to that 
globally shared variable are going to be locked.

I would suggest a slightly different inner loop and convergence test: 
(note ... this relies on something I havent tried in fortran so 
adjustment may be needed)

real*8 vnew(Nx,Ny),dv(Nx,Ny)

do i=1,Nx
  do j=1,Ny
     ! notice that the if-then construct is gone ...
     ! vnew eq 0.0 for boundaries
     vnew(i,j) = 0.25*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1))*
		dabs(boundary(i,j).eq.0)
     dv(i,j) = (dabs(v(i,j)-vnew(i,j)) - convergence_v )* 
dabs(boundary(i,j).eq.0)
  end do
end do

! now all you need is a "linear scan" to find positive elements in
! dv.  You can approach these as sum reductions, and do them in
! parallel
do i=1,Nx
  sum=0.0
  do j=1,Ny
   sum = sum + dabs(dv(i,j) .gt. 0.0) * dv(i,j)
  end do
  if (sum .gt. 0.0) converged = 0
end do

The basic idea is to replace the inner loop conditionals and remove as 
many of the shared variables as possible.

Also c.f. examples here:  http://www.linux-mag.com/id/4609  specifically 
the Riemann zeta function (fairly trivial).

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615