[Beowulf] Re: Performance characterising a HPC application

stephen mulcahy smulcahy at aplpi.com
Wed Mar 28 08:05:04 PDT 2007


Hi,

As a follow on to my previous mail, I've gone ahead and run the Intel 
MPI Benchmarks (v3.0) on this cluster and gotten the following results - 
I'd be curious to know how they compare to other similar clusters.

Also, I'm trying to determine which parts of the IMB results are most 
important for me - my understanding is that PingPong is a good measure 
of overall latency and bandwidth between individual nodes in the cluster.

Am I correct in thinking that Bcast and Reduce are good indicators of 
the performance of the cluster in terms of sending and receiving data 
from the head node to the compute nodes? My guess is that the other 
benchmarks are not as relevant to me since they measure performance for 
various types of inter-node traffic rather than the one-to-many pattern 
exhibited by my application.

All comments welcome.

#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.0, MPI-1 part
#---------------------------------------------------
# Date                  : Wed Mar 28 11:59:55 2007
# Machine               : x86_64
# System                : Linux
# Release               : 2.6.17-2-amd64
# Version               : #1 SMP Wed Sep 13 17:49:33 CEST 2006
# MPI Version           : 2.0
# MPI Thread Environment: MPI_THREAD_SINGLE

#
# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Alltoall
# Alltoallv
# Bcast
# Barrier

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 78 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
        #bytes #repetitions      t[usec]   Mbytes/sec
             0         1000        27.63         0.00
             1         1000        28.56         0.03
             2         1000        28.65         0.07
             4         1000        28.72         0.13
             8         1000        27.03         0.28
            16         1000        28.63         0.53
            32         1000        28.79         1.06
            64         1000        28.72         2.13
           128         1000        28.61         4.27
           256         1000        28.75         8.49
           512         1000        27.90        17.50
          1024         1000        27.55        35.45
          2048         1000        29.70        65.77
          4096         1000        86.92        44.94
          8192         1000        87.85        88.93
         16384         1000        91.98       169.88
         32768         1000       105.01       297.60
         65536          640       149.88       417.00
        131072          320       312.52       399.98
        262144          160       547.92       456.27
        524288           80       998.77       500.62
       1048576           40      2008.35       497.92
       2097152           20      3407.78       586.89
       4194304           10      6583.70       607.56

#---------------------------------------------------
# Benchmarking PingPing
# #processes = 2
# ( 78 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
        #bytes #repetitions      t[usec]   Mbytes/sec
             0         1000        56.25         0.00
             1         1000        57.33         0.02
             2         1000        57.46         0.03
             4         1000        77.31         0.05
             8         1000        57.20         0.13
            16         1000        57.07         0.27
            32         1000        57.23         0.53
            64         1000        57.42         1.06
           128         1000        57.69         2.12
           256         1000        57.60         4.24
           512         1000        58.98         8.28
          1024         1000        58.29        16.75
          2048         1000        59.40        32.88
          4096         1000       145.10        26.92
          8192         1000       145.29        53.77
         16384         1000       162.19        96.34
         32768         1000       179.61       173.99
         65536          640       245.40       254.69
        131072          320       380.17       328.80
        262144          160       661.36       378.01
        524288           80      1433.83       348.72
       1048576           40      2932.20       341.04
       2097152           20      5780.60       345.98
       4194304           10     11424.02       350.14

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 80
#-----------------------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec] 
Mbytes/sec
             0         1000       102.46       103.03       102.74 
    0.00
             1         1000        33.54        34.44        34.00 
    0.06
             2         1000        35.91        36.81        36.31 
    0.10
             4         1000        46.81        47.49        47.14 
    0.16
             8         1000        33.95        34.78        34.40 
    0.44
            16         1000        33.68        34.22        33.89 
    0.89
            32         1000        34.47        35.15        34.87 
    1.74
            64         1000        34.15        34.90        34.56 
    3.50
           128         1000        34.59        35.34        34.91 
    6.91
           256         1000        35.60        36.25        35.98 
   13.47
           512         1000        50.06        51.01        50.56 
   19.14
          1024         1000        42.24        43.27        42.71 
   45.14
          2048         1000        51.47        52.81        51.97 
   73.97
          4096         1000       117.48       120.68       119.42 
   64.74
          8192         1000       129.47       131.17       130.36 
  119.12
         16384         1000       166.94       168.56       167.92 
  185.40
         32768         1000       324.22       326.33       325.21 
  191.53
         65536          640      1164.16      1168.75      1166.25 
  106.95
        131072          320      2095.51      2114.12      2104.03 
  118.25
        262144          160      3784.71      3839.86      3810.15 
  130.21
        524288           80      6294.95      6453.04      6375.89 
  154.97
       1048576           40     11421.92     11935.05     11662.59 
  167.57
       2097152           20     21297.20     22613.39     21929.60 
  176.89
       4194304           10     39541.48     44568.59     41939.00 
  179.50

#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 80
#-----------------------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec] 
Mbytes/sec
             0         1000        93.97        96.90        95.43 
    0.00
             1         1000        98.09       101.16        99.57 
    0.04
             2         1000        99.51       102.63       101.02 
    0.07
             4         1000        98.09       101.09        99.62 
    0.15
             8         1000        99.81       103.00       101.34 
    0.30
            16         1000        97.72       100.84        99.18 
    0.61
            32         1000        99.55       102.56       101.00 
    1.19
            64         1000       100.49       103.38       101.87 
    2.36
           128         1000       100.86       103.96       102.39 
    4.70
           256         1000       102.44       105.48       103.91 
    9.26
           512         1000       114.76       118.16       116.47 
   16.53
          1024         1000       134.88       138.95       136.93 
   28.11
          2048         1000       179.37       184.94       182.18 
   42.24
          4096         1000       227.07       230.26       228.71 
   67.86
          8192         1000       345.83       350.16       347.97 
   89.24
         16384         1000       451.09       454.61       453.02 
  137.48
         32768         1000       751.66       754.99       753.41 
  165.56
         65536          640      1729.67      1740.25      1734.85 
  143.66
        131072          320      3271.37      3313.93      3290.97 
  150.88
        262144          160      6534.62      6699.24      6618.95 
  149.27
        524288           80     19247.82     22501.36     21195.91 
   88.88
       1048576           40     36944.68     53998.23     45728.51 
   74.08
       2097152           20     57913.85     85633.50     71276.85 
   93.42
       4194304           10     92781.81    135353.09    112809.08 
  118.21

#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         0.05         0.10         0.06
             4         1000       406.90       407.23       407.10
             8         1000       407.26       435.61       413.58
            16         1000       412.81       437.32       434.08
            32         1000       413.75       414.20       414.05
            64         1000       415.72       416.09       415.94
           128         1000       426.47       426.82       426.67
           256         1000       455.54       455.88       455.70
           512         1000       493.87       494.30       494.09
          1024         1000       573.35       573.80       573.58
          2048         1000       812.28       812.77       812.51
          4096         1000      1202.61      1203.18      1202.90
          8192         1000      1961.11      1962.48      1961.79
         16384         1000      5923.31      5924.10      5923.76
         32768         1000      6192.97      6193.86      6193.40
         65536          640      6857.42      6859.05      6858.07
        131072          320      8440.67      8443.51      8442.00
        262144          160      9365.96      9370.85      9368.97
        524288           80     20703.72     20729.39     20718.58
       1048576           40     27468.00     27519.45     27500.07
       2097152           20     45600.34     45712.20     45663.97
       4194304           10     90308.81     91164.30     90731.45

#----------------------------------------------------------------
# Benchmarking Reduce
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         0.08         0.10         0.09
             4         1000      1126.87      1137.98      1134.28
             8         1000      1208.77      1219.90      1216.20
            16         1000      1209.49      1223.14      1217.30
            32         1000      1172.07      1183.32      1179.02
            64         1000      1226.63      1240.10      1235.16
           128         1000      1126.54      1138.57      1134.23
           256         1000      1007.31      1019.88      1015.72
           512         1000       467.01       482.19       477.09
          1024         1000       501.13       517.38       511.16
          2048         1000       584.22       604.44       596.47
          4096         1000       998.39      1021.67      1014.32
          8192         1000      2017.85      2050.07      2040.97
         16384         1000      2518.01      2558.84      2547.05
         32768         1000      5182.68      5280.99      5255.73
         65536          640      6670.40      6864.10      6801.06
        131072          320     11229.11     11828.30     11651.15
        262144          160     14239.53     14334.41     14297.16
        524288           80     25720.76     26278.30     26028.27
       1048576           40     46440.23     47101.22     46791.35
       2097152           20     41616.45     43945.90     42856.80
       4194304           10     62286.81     67221.62     64907.90

#----------------------------------------------------------------
# Benchmarking Reduce_scatter
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         1.15         1.17         1.15
             4         1000         2.79       116.45        37.74
             8         1000         2.88       146.80        28.30
            16         1000         2.89       174.36        31.96
            32         1000         3.23       302.12        45.65
            64         1000         3.12       408.95        93.14
           128         1000         3.07       576.51       232.83
           256         1000         3.10       923.28       738.29
           512         1000      1090.20      1092.25      1091.05
          1024         1000      1212.27      1217.47      1214.91
          2048         1000      1300.73      1306.18      1302.84
          4096         1000      1474.26      1476.34      1475.58
          8192         1000      1935.20      1936.17      1935.86
         16384         1000      2562.77      2563.99      2563.63
         32768         1000      3874.37      3876.27      3875.87
         65536          640     73380.90     73387.62     73385.88
        131072          320    350385.89    350418.67    350407.98
        262144          160     12655.13     12828.11     12671.15
        524288           80     23142.21     23554.27     23348.91
       1048576           40     41440.35     41724.03     41584.49
       2097152           20     58607.35     59571.41     59087.99
       4194304           10     94975.40    100692.30     97813.92

#----------------------------------------------------------------
# Benchmarking Allgather
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         0.04         0.07         0.04
             1         1000       545.25       545.80       545.57
             2         1000       547.91       548.36       548.17
             4         1000       551.94       552.33       552.10
             8         1000       561.78       562.39       562.17
            16         1000       575.88       576.23       576.08
            32         1000       628.04       628.39       628.24
            64         1000       752.22       752.61       752.39
           128         1000      1294.73      1295.54      1295.22
           256         1000      1586.71      1587.68      1587.17
           512         1000      2452.51      2453.68      2453.00
          1024         1000      3224.73      3226.33      3225.46
          2048         1000      6266.51      6269.77      6268.09
          4096         1000      9120.61      9124.76      9122.54
          8192         1000     12218.66     12224.11     12221.36
         16384         1000     17071.62     17074.78     17072.80
         32768         1000     49022.41     49034.87     49028.81
         65536          640     87679.30     87708.18     87689.16
        131072          320    208738.13    208869.04    208797.07
        262144          160    666085.23    668277.54    666775.03
        524288           80   1022642.06   1027948.96   1025793.19
       1048576           40   1756476.15   1765711.88   1760951.56
       2097152           20   2952938.39   3003734.70   2978818.70
       4194304           10   5375798.80   5563639.71   5463218.50

#----------------------------------------------------------------
# Benchmarking Allgatherv
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         0.62         0.66         0.63
             1         1000       824.94       825.93       825.64
             2         1000       816.39       817.29       817.12
             4         1000       823.29       824.36       824.04
             8         1000       850.35       851.38       851.11
            16         1000       938.29       939.63       939.25
            32         1000      3308.89      3313.03      3311.31
            64         1000      6124.38      6131.41      6128.28
           128         1000      8544.21      8553.72      8549.35
           256         1000     10692.58     10704.33     10698.90
           512         1000     15584.33     15601.42     15593.43
          1024         1000     26275.79     26304.07     26290.89
          2048         1000     42962.14     43008.67     42987.01
          4096         1000     77686.79     77770.12     77731.56
          8192         1000    152732.94    152891.89    152816.73
         16384         1000    320425.46    320757.80    320600.28
         32768         1000    631404.59    632062.11    631743.42
         65536          640   1254384.65   1256389.27   1255416.87
        131072          320   2487610.17   2495430.69   2491640.65
        262144          160   4942483.65   4973375.58   4958312.07
        524288           80   9811217.50   9934000.07   9873954.26
       1048576           40  19361965.20  19851499.22  19611739.34
       2097152           20  37751459.36  39706360.45  38748283.59
       4194304           10  71663110.69  79475067.78  75644859.70

#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         0.04         0.07         0.04
             1         1000       840.42       842.01       841.73
             2         1000       835.52       836.01       835.80
             4         1000       844.12       844.63       844.39
             8         1000       861.18       861.77       861.44
            16         1000       876.76       877.40       877.11
            32         1000      1000.98      1001.49      1001.24
            64         1000      1260.97      1261.64      1261.23
           128         1000      1833.21      1834.15      1833.73
           256         1000      3002.95      3005.31      3004.92
           512         1000      3243.64      3247.10      3246.63
          1024         1000     71991.33     71995.38     71994.29
          2048         1000    139015.97    139024.61    139021.92
          4096         1000     21338.94     21340.67     21339.93
          8192         1000     34104.60     34107.67     34106.62
         16384         1000     52820.66     52824.18     52823.20
         32768         1000    104874.65    104882.13    104880.06
         65536          640    276360.84    276382.19    276375.25
        131072          320    484861.93    484902.27    484888.37
        262144          160    921113.57    921191.36    921158.10
        524288           80   1768213.71   1768460.08   1768363.28
       1048576           40   4508935.68   4509846.20   4509394.44
       2097152           20   9380809.45   9398098.46   9389680.55
       4194304           10  15565499.50  15594503.59  15581532.16

#----------------------------------------------------------------
# Benchmarking Alltoallv
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         0.58         0.64         0.60
             1         1000      2763.69      2766.87      2766.36
             2         1000      2706.75      2709.65      2709.11
             4         1000      2704.92      2708.35      2707.77
             8         1000      2823.48      2826.69      2826.01
            16         1000      2988.69      2991.50      2991.14
            32         1000      2774.42      2777.30      2776.77
            64         1000      2786.69      2789.71      2789.10
           128         1000      3199.55      3202.31      3201.83
           256         1000      2876.58      2880.09      2879.65
           512         1000     55243.44     55249.64     55248.49
          1024         1000    188625.26    188640.15    188636.72
          2048         1000    187070.08    187289.79    187084.89
          4096         1000    262043.66    262061.23    262056.45
          8192         1000    359340.54    359370.18    359362.57
         16384         1000    345624.49    345687.29    345655.10
         32768         1000    418706.51    418986.49    418753.78
         65536          640   1577051.93   1577752.86   1577348.59
        131072          320   1792183.76   1794663.90   1793357.05
        262144          160   2095284.36   2100250.26   2097504.22
        524288           80   2629670.40   2636932.59   2633053.48
       1048576           40   4407978.18   4443426.53   4422000.89
       2097152           20   8467411.01   8555431.95   8533239.83
       4194304           10  16534436.20  16844729.71  16787259.88

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 80
#----------------------------------------------------------------
        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
             0         1000         0.06         0.10         0.07
             1         1000      2559.63      2642.39      2577.71
             2         1000      2577.59      2660.47      2592.78
             4         1000      2541.65      2624.19      2561.87
             8         1000      2538.07      2579.05      2556.78
            16         1000      2540.35      2581.47      2558.62
            32         1000      2539.32      2580.52      2557.10
            64         1000      2539.23      2580.50      2555.27
           128         1000      2539.12      2581.62      2553.66
           256         1000      2585.36      2627.14      2588.85
           512         1000       141.55       142.39       141.99
          1024         1000       208.98       210.17       209.78
          2048         1000       227.81       228.95       228.45
          4096         1000       293.99       318.13       306.15
          8192         1000       486.10       487.18       486.73
         16384         1000       799.65       801.21       800.74
         32768         1000      1483.64      1486.06      1485.42
         65536          640      3192.47      3199.19      3197.68
        131072          320      6314.70      6341.48      6335.46
        262144          160     12469.44     12554.68     12532.73
        524288           80      9770.92     10413.19     10318.85
       1048576           40     18792.78     20762.40     20533.59
       2097152           20     33849.45     42141.25     41535.32
       4194304           10     65966.61     81472.99     79850.54

#---------------------------------------------------
# Benchmarking Barrier
# #processes = 80
#---------------------------------------------------
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
          1000       750.75       770.57       763.23


-stephen

-- 
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
    GMIT, Dublin Rd, Galway, Ireland.      http://www.aplpi.com



More information about the Beowulf mailing list