[Beowulf] MPI application benchmarks

Mon May 7 15:05:54 PDT 2007

On Mon, May 07, 2007 at 09:56:10AM +0200, Toon Knapen wrote:
> Mark Hahn wrote:
> >>Sigh. I thought I could avoid that response. Our own code (due to the no.
> >>of users who all believe that their code is the most important and
> >>therefore must be benchmarked) is so massive that any potential RFP
> >>respondent would have to work a year to run the code. Thus, we have to
> >
> >sure.  the suggestion is only useful if the cluster is dedicated to a 
> >single purpose or two.  for anything else, I really think that 
> >microbenchmarks are the only way to go.  after all, your code probably
> >doesn't do anything which is truely unique, but rather is some 
> >combination of a theoretical microbenchmark "basis set".  no, I don't
> >know how to establish the factor weights, or whether this approach 
> >really provides a good predictor.  but isn't it the obvious way,
> >even the only tractable way?
> >
> 
> 
> Agreed. On one hand you need micro-benchmark. OTOH you need your users 
> to specify what are the sensitive points of their application. First of 
> all, I suppose their applications are parallel, but are they BW-bound or 
> latency-bound. How much time do the applications spend on communication? 
> Are the app's capable of running in mixed-mode (MPI combined with 
> multithreading), ...
> 
> Why do'nt you make a list of multiple-choice questions in a style as 
> described above and ask your users to fill that in. This solves also the 
> 'weighting factor' because the users that respond to your question 
> _care_ about the machine being suitable while the others care less.

Several reasons why this does not work quite the way we would like:
- it is surprising (or not ...) how many users simply do not know
  how to characterize their application. The only way is to get a
  copy of those applications that chew up most of the walltime,
  compile them, try to understand what the application is doing,
  and then classify it yourself. Takes a lot of time ... unless
  it is a well known application like gromacs.
- we start the work on the benchmark now. The RFP will be issued
  many months down the road. The equipment will be purchased many
  more months down the road, ... At the time when users get on the
  facilities the users have change, their applications have changed,
  etc., etc. Thus, "our own applications" today are not (necessarily)
  relevant tomorrow when the equipment is purchased.

Thus, using setting up a benchmark suite that covers the whole spectrum
of "interesting" applications and then use weight factors at the time
when you make the decision is more practical and sensible than the
"use your own applications" approach. (You do have to know at least
approximately in which category those applications fall you care most
about, but you do not have to assemble those applications into a
benchmark suite).

Cheers,
Martin