Beowulf & FFT
Robert G. Brown
rgb at phy.duke.edu
Mon Jul 24 14:05:54 PDT 2000
On Mon, 24 Jul 2000, Martin Siegert wrote:
> The problem is that you have to do a matrix transpose between the two
> loops. FFTW does this using MPI_Alltoall. As Tony pointed out this may not
> be the best choice. So I've been busy writing a matrix transpose routine
> myself - something that is trivial for serial programs is totally nontrivial
> when the matrix is distributed over several processors. If somebody can
> give me a hint or reference how this is done for best performance, I'll
> be thankful forever :-) [right now I'm trying MPI_Type_vector to define
I don't know if it will be of use to you, but on the off chance of
having somebody thankful forever (which is a very long time, after
all;-) check out:
This is the "Convolution" section of "Designing and Building Parallel
Programs", by Ian Foster, an online book. It covers 2d FFT's in fair
detail, including a comparison between transposition and pipelining
algorithms. It suggests that pipelining might be better when N is
"small", as it appears to be for you.
contains a discussion of tranposition algorithms appropriate for
hypercubical architectures, which also might be better for small N, if
you can rearrange your systems into a suitable hypercube.
You'll likely want the paper copy of the book if you find this useful.
I suspect that you already have found a lot of what it tells you though.
A very nice book and useful reference to bookmark if you are a
beowulfer, in any event.
> Actually, I can: I have to average over random initial conditions. At least
> in principle - I find that usually 4 samples are already enough, even a single
> sample already gives you the growth exponent quite accurately. That's why
> I hoped that I could run a single run in parallel - then I could go to
> larger system sizes and later times. If it can't be done, I'll do exactly
> as you said.
Good luck -- your problem sounds like you can probably find a region and
parallel algorithm where you get >>some<< parallel speedup for a
relatively small number of systems, although you may not get linear
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf