> Here is the idea: There should be benchmarks and speedup data for different
> type of cluster applications:
> 1. Embarrassingly parallel (e.g., Monte-Carlo simulations).
>    In this case the benchmark will be dominated by the CPUs, the
>    interconnect is unimportant, the speedup curve will show linear
>    scaling for (almost) unlimited number of processors.
> 2. Applications with "nearest neighbour" communications
>    (e.g., finite-difference methods for PDEs). In this case there is
>    significant communication between processors, however, since the
>    communication is local (i.e., processor n only talks with n+1 and
>    n-1) the scaling of the communication time with the # of processors
>    is not so bad (constant + probably a small linear piece).
>    In this case you should see a maximum in the speedup curve the location
>    of which depends on you interconnect.
> 3. Applications with pairwise (all-to-all) communications
>    (e.g., parallel FFT). In this case the time for communication scales
>    proportional to the square of the # of processors. The benchmark will
>    be dominated by the speed of the interconnect, i.e., the speedup curve
>    will show minimal speedups (or even speedups < 1) for fast ethernet.

Umm, doesn't that describe the NAS parallel benchmarks ?
IIRC they include most of those categories, and have the advantages

1) They already exist (you can download the sources)
2) There are already numbers for them on many machines

