[Beowulf] Re: Performance characterising a HPC application

stephen mulcahy smulcahy at aplpi.com
Fri Mar 30 02:51:07 PDT 2007


[resend - I think my first attempt was canned due to being too large,
I've stripped it down to PingPong, Bcast and Reduce]

Hi,

As a follow on to my previous mail, I've gone ahead and run the Intel
MPI Benchmarks (v3.0) on this cluster and gotten the following results -
I'd be curious to know how they compare to other similar clusters.

Also, I'm trying to determine which parts of the IMB results are most
important for me - my understanding is that PingPong is a good measure
of overall latency and bandwidth between individual nodes in the cluster.

Am I correct in thinking that Bcast and Reduce are good indicators of
the performance of the cluster in terms of sending and receiving data
from the head node to the compute nodes? My guess is that the other
benchmarks are not as relevant to me since they measure performance for
various types of inter-node traffic rather than the one-to-many pattern
exhibited by my application.

All comments welcome.

#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.0, MPI-1 part
#---------------------------------------------------
# Date                  : Wed Mar 28 11:59:55 2007
# Machine               : x86_64
# System                : Linux
# Release               : 2.6.17-2-amd64
# Version               : #1 SMP Wed Sep 13 17:49:33 CEST 2006
# MPI Version           : 2.0
# MPI Thread Environment: MPI_THREAD_SINGLE

#
# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Alltoall
# Alltoallv
# Bcast
# Barrier

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 78 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000        27.63         0.00
            1         1000        28.56         0.03
            2         1000        28.65         0.07
            4         1000        28.72         0.13
            8         1000        27.03         0.28
           16         1000        28.63         0.53
           32         1000        28.79         1.06
           64         1000        28.72         2.13
          128         1000        28.61         4.27
          256         1000        28.75         8.49
          512         1000        27.90        17.50
         1024         1000        27.55        35.45
         2048         1000        29.70        65.77
         4096         1000        86.92        44.94
         8192         1000        87.85        88.93
        16384         1000        91.98       169.88
        32768         1000       105.01       297.60
        65536          640       149.88       417.00
       131072          320       312.52       399.98
       262144          160       547.92       456.27
       524288           80       998.77       500.62
      1048576           40      2008.35       497.92
      2097152           20      3407.78       586.89
      4194304           10      6583.70       607.56

#----------------------------------------------------------------
# Benchmarking Reduce
# #processes = 80
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.08         0.10         0.09
            4         1000      1126.87      1137.98      1134.28
            8         1000      1208.77      1219.90      1216.20
           16         1000      1209.49      1223.14      1217.30
           32         1000      1172.07      1183.32      1179.02
           64         1000      1226.63      1240.10      1235.16
          128         1000      1126.54      1138.57      1134.23
          256         1000      1007.31      1019.88      1015.72
          512         1000       467.01       482.19       477.09
         1024         1000       501.13       517.38       511.16
         2048         1000       584.22       604.44       596.47
         4096         1000       998.39      1021.67      1014.32
         8192         1000      2017.85      2050.07      2040.97
        16384         1000      2518.01      2558.84      2547.05
        32768         1000      5182.68      5280.99      5255.73
        65536          640      6670.40      6864.10      6801.06
       131072          320     11229.11     11828.30     11651.15
       262144          160     14239.53     14334.41     14297.16
       524288           80     25720.76     26278.30     26028.27
      1048576           40     46440.23     47101.22     46791.35
      2097152           20     41616.45     43945.90     42856.80
      4194304           10     62286.81     67221.62     64907.90


#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 80
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.06         0.10         0.07
            1         1000      2559.63      2642.39      2577.71
            2         1000      2577.59      2660.47      2592.78
            4         1000      2541.65      2624.19      2561.87
            8         1000      2538.07      2579.05      2556.78
           16         1000      2540.35      2581.47      2558.62
           32         1000      2539.32      2580.52      2557.10
           64         1000      2539.23      2580.50      2555.27
          128         1000      2539.12      2581.62      2553.66
          256         1000      2585.36      2627.14      2588.85
          512         1000       141.55       142.39       141.99
         1024         1000       208.98       210.17       209.78
         2048         1000       227.81       228.95       228.45
         4096         1000       293.99       318.13       306.15
         8192         1000       486.10       487.18       486.73
        16384         1000       799.65       801.21       800.74
        32768         1000      1483.64      1486.06      1485.42
        65536          640      3192.47      3199.19      3197.68
       131072          320      6314.70      6341.48      6335.46
       262144          160     12469.44     12554.68     12532.73
       524288           80      9770.92     10413.19     10318.85
      1048576           40     18792.78     20762.40     20533.59
      2097152           20     33849.45     42141.25     41535.32
      4194304           10     65966.61     81472.99     79850.54

-stephen

-- 
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
   GMIT, Dublin Rd, Galway, Ireland.      http://www.aplpi.com




More information about the Beowulf mailing list