[Beowulf] Liinpack benchmark
Robert G. Brown
rgb at phy.duke.edu
Mon Nov 14 06:06:14 PST 2005
On Fri, 11 Nov 2005, balamurugan wrote:
>> gettimeofday() is definitely a very good suggestion.
> 5. There was no errors or warnings during compilation and the executable ran
> successfully, but the result was not the one expected to be produced and had
> negative values. I understand from analysis that the there could be a problem
> in linking the c file and the fortran code.
> 6. Is there any way to get around this problem.
gettimeofday() without the RTSC typically has a precision of only around
2 usec (yes, I measured) for all that it has nsec-level fields in its
returned struct. However, I'm pretty sure that gettimeofday now works
on top of the RTSC automatically if it is there at the library level,
where it STILL returns nsec but now they are accurate to somewhere in
the 20-100 nsec range -- in other words, I don't believe that you will
get nsec resolution from any clock available on the system without using
ntp-like call latency predictor/corrector algorithms in the timing, but
sub-usec timings are definitely possible.
However, this sounds like a "bug" and not something that should be (or
maybe even "can be") worked around at the userspace level. It will only
get worse as number of CPUs increases, as well -- it is GOOD to have
cycle counters, but either they need to synchronize at the hardware
level at boot time (so there is just one value of the counter across the
systems, even though each CPU tracks its own) or there needs to be some
way of forcing processor affinity for the cycle counter call, possibly
only for the cycle counter call itself (since real applications you are
benchmarking may be multiprocessor or free to migrate by intention) and
taking any latency hits on obtaining the value if the task has migrated.
Ugly and likely highly error prone.
Of these two, it is pretty clear that forcing cycle counter identity
across all running CPUs is by far the desireable course of action, but
that is ideally a preboot task for the AMD engineers and not something
that anybody should have to do in software (if one CAN do anything in
I'm looking to see if one can WRITE to the TSC post boot -- if one can
then it might be a fix for the kernel team -- install an ntp-like
handshaking/sync process at a controlled point in the boot so that all
CPUs emerge from a linux boot with their counters in sync. Assuming the
most basic and necessary synchronization (one that almost certainly has
to be there anyway for the CPUs to be able to function together) then a
call to any of the TSC's would return the same value regardless of
whether the task has migrated.
Ultimately, as I think it was Greg that pointed out, benchmarking isn't
for the faint of heart -- it isn't easy to get really accurate
benchmarks, especially microbenchmarks. I'll have to look into this
stuff myself to see how it impacts my own microbenchmark code -- I
haven't observed any problem with it so far on dual opterons, but my
microbenchmarks are probably relatively unlikely to migrate between
processors once started so it isn't clear that I would. I would also
EXPECT that the difference in cycle counter value between the processors
would be fairly small depending on how the counters come alive at boot
time -- but that's really an empirical thing.
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf