[Beowulf] Time measurement

Sat Aug 6 08:04:36 PDT 2005

At 08:02 AM 8/5/2005 -0400, Robert G. Brown wrote:
>Vincent Diepeveen writes:
>
>> Always measure wall clock time of execution. That's IMHO the only thing
>> that really counts and it includes all overhead.
>> 
>> Take into account that at most clusters the wall clock time of node A is
>> not the same like wall clock of node B.
>> 
>> If you start a job at the head node and spawn it from there further to
>> the rest of the cluster, what counts obviously is the time from when you
>> started it, until it finished execution.
>> 
>> Because in reality what matters is how quickly you did get the job done.
>
>Unless you are trying to time how fast the computer does actual
>instructions in a particular context.
>
>Remember, there are lots of reasons to time things.  One is certainly to
>time how fast you get a "job" done, where a job is a complex entity with
>all sorts of overhead and inefficiencies.  However, there are others,
>such as wanting to know how fast a computer can generate random numbers
>with a particular algorithm inside a generic loop WITHOUT having the
>results affected by the fact that your computer at the time of the test
>was running a cron job or dealing with a broadcast storm generated by
>some ill-managed system down the wire.  Or how fast it can do a simple
>divide operation in a given context, again doing one's very best NOT to
>include random delays introduced by an interrupt and/or context switch
>that happens to occur right in the middle of the timing interval.  In
>these microbenchmark contexts it is actually a PITA to "prepare" the
>system in such a way as to make interruptions like this unlikely in the
>timing interval(s) and why writing a "good" microbenchmark program is
>nontrivial.

As soon as you allow scientists to do measurements of their results
without wall time clock, then problems really will grow above Mount Everest.

The stopwatch is what counts!

>Even when timing jobs one has to remember that BECAUSE of the
>uncertainty in the "state" of the computer during the timing interval,
>your recorded times may or may not be terribly accurate predictors of
>actual performance in a different global context or state.  It is
>important to do a NUMBER of measurements if possible, ideally with some
>degree of knowledge and control of system state, and at least eyeball
>the statistics of the results.  
>
>Doing this has revealed lots of interesting things (on this list, even)
>over the years such as huge delays "randomly" inserted in TCP streams,
>anomalies in the rate at which processors perform particular
>instructions (for example, multiplying by a power of two in C code is
>often a TERRIBLE predictor for how fast a processor multiples even in an
>instruction form such as
>
>  a[i] = 2.0*b[i];
>
>because modern processors optimize such operations and perform them much
>faster than
>
>  a[i] = 3.14159*b[i];
>
>) anomalies in the performance of all sorts of network adapters (some of
>which work(ed) fine for short traffic bursts but collapsed on the floor
>screaming if fed a system-saturating stream of small packets).
>Generating an actual histogram of timings is good.  Look for outliers
>(if any) -- these are an indication that something highly nonlinear and
>state dependent is going on in your code (or on your system) and is
>often a place to focus optimization energy.
>
>With all that said, yes, using wall time is a very good thing to do,
>ideally measured with the system CPU timer itself.  gettimeofday() in
>the past has had a call resolution of about 2 usec (2000 nanoseconds)
>depending on how it is implemented (I think it it moving or has moved
>towards being implemented on top of the CPU timer where possible).  The
>CPU timer can yield a time resolution of 40-70 nanoseconds per timing
>call pair, which of course can be improved with good statistics and/or
>inlining the timing assembler and avoiding subroutine calls.
>
>A final good thing to do is to remember profiling.  Even "jobs" --
>perhaps especially "jobs" -- benefit from profiling.  The times won't be
>terribly accurate because of job instrumentation and so on, but getting
>a good idea of where your job is spending most of its time can be a
>surprising and rewarding thing to do.  Surprising because it might not
>be where you think it is; rewarding because once you know where it is
>doing a lot of work you may be able to rearrange it for improved
>performance.
>
>    rgb
>
>> 
>> Vincent
>> 
>> At 08:13 AM 8/1/2005 -0700, ThanhVu  H. Nguyen - Gmail wrote:
>>>Hi, just wondering what the standard way of measure the execution time
>>>?  
>>>
>>>2 methods I thought about are: 
>>>
>>>1) /usr/bin/time  prog   : this includes all the communcation, i/o ,
>>>loading overhead etc
>>>
>>>2) include start_time , end_time code in the program : this won't
>>>include the communication , i/o , loading etc overhead.
>>>
>>>what method is usually used ?  thanks  
>>>
>>>tvn,
>>>
>>>ThanhVu H. Nguyen
>>>_______________________________________________
>>>Beowulf mailing list, Beowulf at beowulf.org
>>>To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>Attachment Converted: "g:\internet\eudora\attach\Re [Beowulf] Time
measurement"
>