Cluster Monitoring software?

Robert G. Brown rgb at
Mon Nov 6 09:56:01 PST 2000

On Mon, 6 Nov 2000, Sadiqs wrote:

> i need some help with flops please. 
> how do you go around calculating flops? :)))))

Well, you can easily see how I calculate ONE measure of >>bogus<< FLOPS
in the open source code of cpu-rate, available on the website of:

  <a href="">Brahma</a>

It is something like (with lots of detail left out):

 double *d;             /* To test "double" floats */
 d = (double *) malloc((size_t) (size*sizeof(double)));
 total_time = 0.0;
 total_time2 = 0.0;
     start = gettod();
       d[i] = 1.0;
       d[i] = (1.0 + d[i])*(1.5 - d[i])/d[i];
     delta = gettod() - start;
     total_time += delta;
     total_time2 += delta*delta;

(and then subtract the empty loop time and convert the times into rates,
with a suitable definition for gettod()).  Each d[i] = (1.0 + d[i])*(1.5
- d[i])/d[i]; is counted as four floating point operations, one each of
add, subtract, multiply and divide.  The addressing arithmetic in the
d-vector itself is more or less ignored, as there is always some
addressing overhead in a loop and so it is is "part" of the cost of a
flop, sort of, as far as I'm concerned.  This particular combination is
chosen (with d[i] initialized to 1.0) so that it is numerically stable
over lots of samples and yet not knowable to the compiler (so that the
final divide is actually done and not represented as an inversion and
multiply, which is generally a bit faster).

However, note well that the rate this returns is BOGUS.  These are
BOGOMFLOPS, just like the "mips" your kernel talks about at boot time
are "bogomips".  This is very important to realize.  It is certainly
entertaining and possibly useful to know how fast your computer can do
arithmetic under certain "ideal" conditions, but >>for this code<< that
rate can vary by a factor of >>six<< as a function of vector length.

To understand this, I urge that you read the <a
href="">CPU Vector
Performance Summary</a> on the Brahma website.  In it, I present a
comparison of five reasonably current systems -- a (dual) 933 MHz PIII
with RDRAM, two Athlons (original and Thunderbird) with PC133, a Compaq
667 MHz XP1000, and a (dual) 466 MHz Celeron with PC66.  The figure at
the top of this page is the measured double precision bogomflop rating
of these CPUs executing the general code above for vector lengths (in
bytes) ranging from 8 to 16 mega.  To quote the conclusion of this very
short summary paper:

  If nothing else, it [the figure above in the summary -- rgb] will
  provide a fairly objective answer to the universally asked question:
  "How fast is my CPU?" Pick a number, any number, between the maximum
  in L1 and minimum running out of main memory (which spans almost an
  order of magnitude in speeds!).  That's roughly how meaningful a
  single cited speed rating for a CPU is.

...not that this number cannot be used constructively, either in research
proposals or in program or beowulf design.  Still, the >>figure itself<<
is far >>more<< useful, as it lets one see the tremendous variation in
effective CPU floating point performance for a certain class of core
loop code.  

Hope this helps.


Robert G. Brown	             
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at

More information about the Beowulf mailing list