[Beowulf] OpenMP on AMD dual core processors

Fri Nov 21 07:01:35 PST 2008

Joe Landman wrote:
> Nathan Moore wrote:
> 
>> Any suggestions?  I figured that this would be a simple example to
>> parallelize.  Is there a better example for OpenMP parallelization? 
>> Also, is there something obvious I'm missing in the example below? 
> 
> A few thoughts ...
> 
> Initialize your data in parallel as well.  No reason not to.  But
> optimize that code a bit.  You don't need
> 
>     v_y = v_ground + (v_cloud-v_ground)*(j*dy/Ly)
>            boundary(i,j)=0
>           v(i,j) = v_y
> 
> when
> 
>     v(i,j)=  v_ground + (v_cloud-v_ground)*(j*dy/Ly)
>            boundary(i,j)=0
> 
> will eliminate the explicit temporary variable.  Also the i.eq.0 test is
> guaranteed never to be hit in the if-then construct, as with the j.eq.0.
> 
> You can (and should) replace that if-then construct with a set of loops
> of the form
> 
>     do j=1,Ny
>      boundary(Nx,j) = 1
>     end do         
>     do i=1,Nx
>      boundary(i,Ny) = 1
>     end do
> 
> Also, what sticks out to me is that old_v may be viewed as "shared"
> versus "private".  I know OpenMP is supposed to do the right thing here,
>  but you might need to explicitly mark old_v as private.  And dv for
> that matter.
> 
> Note also that this inner loop is attempting to do a convergence test.
> You are looking to set a globally shared value from within an inner
> loop.  This is not a good thing to do.  This means accesses to that
> globally shared variable are going to be locked.
> 
> I would suggest a slightly different inner loop and convergence test:
> (note ... this relies on something I havent tried in fortran so
> adjustment may be needed)
> 
> 
> real*8 vnew(Nx,Ny),dv(Nx,Ny)
> 
> do i=1,Nx
>  do j=1,Ny
>     ! notice that the if-then construct is gone ...
>     ! vnew eq 0.0 for boundaries
>     vnew(i,j) = 0.25*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1))*
>         dabs(boundary(i,j).eq.0)
>     dv(i,j) = (dabs(v(i,j)-vnew(i,j)) - convergence_v )*
> dabs(boundary(i,j).eq.0)
>  end do
> end do

If this were done with MPI, one would have to be careful of the
boundaries on the matrix as it's partitioned for computation. OpenMP is
intelligent enough to hold off computation on the tiles south and east
of the first until the first is done, and so forth?

> ! now all you need is a "linear scan" to find positive elements in
> ! dv.  You can approach these as sum reductions, and do them in
> ! parallel
> do i=1,Nx
>  sum=0.0
>  do j=1,Ny
>   sum = sum + dabs(dv(i,j) .gt. 0.0) * dv(i,j)
>  end do
>  if (sum .gt. 0.0) converged = 0
> end do
> 
> The basic idea is to replace the inner loop conditionals and remove as
> many of the shared variables as possible.

Yup, keep things pipelined.

> Also c.f. examples here:  http://www.linux-mag.com/id/4609  specifically
> the Riemann zeta function (fairly trivial).
> 

-- 
Geoffrey D. Jacobs