[Beowulf] OpenMP on AMD dual core processors

Fri Nov 21 04:36:43 PST 2008

Fortran isn't one of my better languages, but I did manage to tweak your code
into something that I believe works the same and is openMP friendly.

I put a copy at:
    http://cse.ucdavis.edu/bill/OMPdemo.f

When I used the pathscale compiler on your code it said:
"told.f", line 27: Warning: Referenced scalar variable OLD_V is SHARED by default
"told.f", line 29: Warning: Referenced scalar variable DV is SHARED by default
"told.f", line 31: Warning: Referenced scalar variable CONVERGED is SHARED by
default

I rewrote your code to get rid of those, I didn't know some of the constants
you mentioned dy and Ly.  So I just wrote my own initialization.  I skipped
the boundary conditions by just restricting the start and end of the loops.

Your code seemed to be interpolating between the current iteration (i-1 and
j-1) and the last iteration (i+1 and j+1).  Not sure if that was intentional
or not.  In any case I just processed the array v into v2, then if it didn't
converge I processed the v2 array back into v.  To make each loop independent
I made converge a 1D array which stored the sum of that row's error.  Then
after each array was processed I walked the 1-d array to see if we had
converged.  I exit when all pixels are below the convergence value.

It scales rather well on a dual socket barcelona (amd quad core), my version
iterates a 1000x1000 array with a range of values from 0-200 over 1214
iterations to within a convergence of 0.02.

CPUs time Scaling
=================
1    54.51
2    27.75 1.96 faster
4    14.14 3.85 faster
8     7.75 7.03 faster

Hopefully my code is doing what you intended.

Alas, with gfortran (4.3.1 or 4.3.2), I get a segmentation fault as soon as I
run.  Same if I compile with -g and run it under the debugger.  I'm probably
doing something stupid.