[Beowulf] Execution time measurements

Mon May 23 12:32:33 PDT 2011

Mikhail Kuzminsky sent this to me and asked that it be posted:

BEGIN FORWARD

Mon, 23 May 2011 09:40:13 -0700 Ð¿Ð¸ÑÑŒÐ¼Ð¾ Ð¾Ñ‚ "David Mathog"
<mathog at caltech.edu>:
> > On Fri, May 20, 2011 at 02:26:31PM -0400, Mark Hahn forwarded a message:
> > > When I run 2 identical examples of the same batch job
> simultaneously, execution time of *each* job is
> > > LOWER than for single job run !
>
> Disk caching could cause that. Normally if the data read in isn't too
> big you see an effect where:
>
> run 1: 30 sec <-- 50% disk IO/ 50% CPU
> run 2: 15 sec <-- ~100% CPU

I believe that jobs are CPU-bound: top says that they use 100% of CPU,
and no swap activity.

iostat /dev/sda3 (where IO is performed) says typically something like:

Linux 2.6.22.5-31-default (c6ws1) 05/25/2011
avg-cpu: %user %nice %system %iowait %steal %idle
          1.12 0.00 0.03 0.01 0.00 98.84
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda3 0.01 0.01 8.47 20720 16845881

>
> Of course, you also need to be sure that run 1 isn't interfering with
> run 2. They might, for instance, save/retrieve intermediate values
> to the same filename, so that they really cannot be run safely at the
> same time. That is, they run faster together, but they run incorrectly.

File names used for IO are unique.
I thought also about cpus frequency variations, but I think that null output
of
lsmod|grep freq

is enough for fixed CPU frequency.

END FORWARD

OK, so not disk caching.  

Regarding the frequencies, better to use

  cat /proc/cpuinfo | grep MHz

while the processes are running.

Did you verify that the results for each of the two simultaneous runs
are both correct?  Ideally, tweak some parameter so they are slightly
different from each other.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech