[Beowulf] Re: Performance characterising a HPC application
stephen mulcahy
smulcahy at aplpi.com
Wed Mar 28 08:05:04 PDT 2007
Hi,
As a follow on to my previous mail, I've gone ahead and run the Intel
MPI Benchmarks (v3.0) on this cluster and gotten the following results -
I'd be curious to know how they compare to other similar clusters.
Also, I'm trying to determine which parts of the IMB results are most
important for me - my understanding is that PingPong is a good measure
of overall latency and bandwidth between individual nodes in the cluster.
Am I correct in thinking that Bcast and Reduce are good indicators of
the performance of the cluster in terms of sending and receiving data
from the head node to the compute nodes? My guess is that the other
benchmarks are not as relevant to me since they measure performance for
various types of inter-node traffic rather than the one-to-many pattern
exhibited by my application.
All comments welcome.
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.0, MPI-1 part
#---------------------------------------------------
# Date : Wed Mar 28 11:59:55 2007
# Machine : x86_64
# System : Linux
# Release : 2.6.17-2-amd64
# Version : #1 SMP Wed Sep 13 17:49:33 CEST 2006
# MPI Version : 2.0
# MPI Thread Environment: MPI_THREAD_SINGLE
#
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Alltoall
# Alltoallv
# Bcast
# Barrier
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 78 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 27.63 0.00
1 1000 28.56 0.03
2 1000 28.65 0.07
4 1000 28.72 0.13
8 1000 27.03 0.28
16 1000 28.63 0.53
32 1000 28.79 1.06
64 1000 28.72 2.13
128 1000 28.61 4.27
256 1000 28.75 8.49
512 1000 27.90 17.50
1024 1000 27.55 35.45
2048 1000 29.70 65.77
4096 1000 86.92 44.94
8192 1000 87.85 88.93
16384 1000 91.98 169.88
32768 1000 105.01 297.60
65536 640 149.88 417.00
131072 320 312.52 399.98
262144 160 547.92 456.27
524288 80 998.77 500.62
1048576 40 2008.35 497.92
2097152 20 3407.78 586.89
4194304 10 6583.70 607.56
#---------------------------------------------------
# Benchmarking PingPing
# #processes = 2
# ( 78 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 56.25 0.00
1 1000 57.33 0.02
2 1000 57.46 0.03
4 1000 77.31 0.05
8 1000 57.20 0.13
16 1000 57.07 0.27
32 1000 57.23 0.53
64 1000 57.42 1.06
128 1000 57.69 2.12
256 1000 57.60 4.24
512 1000 58.98 8.28
1024 1000 58.29 16.75
2048 1000 59.40 32.88
4096 1000 145.10 26.92
8192 1000 145.29 53.77
16384 1000 162.19 96.34
32768 1000 179.61 173.99
65536 640 245.40 254.69
131072 320 380.17 328.80
262144 160 661.36 378.01
524288 80 1433.83 348.72
1048576 40 2932.20 341.04
2097152 20 5780.60 345.98
4194304 10 11424.02 350.14
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 80
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
Mbytes/sec
0 1000 102.46 103.03 102.74
0.00
1 1000 33.54 34.44 34.00
0.06
2 1000 35.91 36.81 36.31
0.10
4 1000 46.81 47.49 47.14
0.16
8 1000 33.95 34.78 34.40
0.44
16 1000 33.68 34.22 33.89
0.89
32 1000 34.47 35.15 34.87
1.74
64 1000 34.15 34.90 34.56
3.50
128 1000 34.59 35.34 34.91
6.91
256 1000 35.60 36.25 35.98
13.47
512 1000 50.06 51.01 50.56
19.14
1024 1000 42.24 43.27 42.71
45.14
2048 1000 51.47 52.81 51.97
73.97
4096 1000 117.48 120.68 119.42
64.74
8192 1000 129.47 131.17 130.36
119.12
16384 1000 166.94 168.56 167.92
185.40
32768 1000 324.22 326.33 325.21
191.53
65536 640 1164.16 1168.75 1166.25
106.95
131072 320 2095.51 2114.12 2104.03
118.25
262144 160 3784.71 3839.86 3810.15
130.21
524288 80 6294.95 6453.04 6375.89
154.97
1048576 40 11421.92 11935.05 11662.59
167.57
2097152 20 21297.20 22613.39 21929.60
176.89
4194304 10 39541.48 44568.59 41939.00
179.50
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 80
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
Mbytes/sec
0 1000 93.97 96.90 95.43
0.00
1 1000 98.09 101.16 99.57
0.04
2 1000 99.51 102.63 101.02
0.07
4 1000 98.09 101.09 99.62
0.15
8 1000 99.81 103.00 101.34
0.30
16 1000 97.72 100.84 99.18
0.61
32 1000 99.55 102.56 101.00
1.19
64 1000 100.49 103.38 101.87
2.36
128 1000 100.86 103.96 102.39
4.70
256 1000 102.44 105.48 103.91
9.26
512 1000 114.76 118.16 116.47
16.53
1024 1000 134.88 138.95 136.93
28.11
2048 1000 179.37 184.94 182.18
42.24
4096 1000 227.07 230.26 228.71
67.86
8192 1000 345.83 350.16 347.97
89.24
16384 1000 451.09 454.61 453.02
137.48
32768 1000 751.66 754.99 753.41
165.56
65536 640 1729.67 1740.25 1734.85
143.66
131072 320 3271.37 3313.93 3290.97
150.88
262144 160 6534.62 6699.24 6618.95
149.27
524288 80 19247.82 22501.36 21195.91
88.88
1048576 40 36944.68 53998.23 45728.51
74.08
2097152 20 57913.85 85633.50 71276.85
93.42
4194304 10 92781.81 135353.09 112809.08
118.21
#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.05 0.10 0.06
4 1000 406.90 407.23 407.10
8 1000 407.26 435.61 413.58
16 1000 412.81 437.32 434.08
32 1000 413.75 414.20 414.05
64 1000 415.72 416.09 415.94
128 1000 426.47 426.82 426.67
256 1000 455.54 455.88 455.70
512 1000 493.87 494.30 494.09
1024 1000 573.35 573.80 573.58
2048 1000 812.28 812.77 812.51
4096 1000 1202.61 1203.18 1202.90
8192 1000 1961.11 1962.48 1961.79
16384 1000 5923.31 5924.10 5923.76
32768 1000 6192.97 6193.86 6193.40
65536 640 6857.42 6859.05 6858.07
131072 320 8440.67 8443.51 8442.00
262144 160 9365.96 9370.85 9368.97
524288 80 20703.72 20729.39 20718.58
1048576 40 27468.00 27519.45 27500.07
2097152 20 45600.34 45712.20 45663.97
4194304 10 90308.81 91164.30 90731.45
#----------------------------------------------------------------
# Benchmarking Reduce
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.08 0.10 0.09
4 1000 1126.87 1137.98 1134.28
8 1000 1208.77 1219.90 1216.20
16 1000 1209.49 1223.14 1217.30
32 1000 1172.07 1183.32 1179.02
64 1000 1226.63 1240.10 1235.16
128 1000 1126.54 1138.57 1134.23
256 1000 1007.31 1019.88 1015.72
512 1000 467.01 482.19 477.09
1024 1000 501.13 517.38 511.16
2048 1000 584.22 604.44 596.47
4096 1000 998.39 1021.67 1014.32
8192 1000 2017.85 2050.07 2040.97
16384 1000 2518.01 2558.84 2547.05
32768 1000 5182.68 5280.99 5255.73
65536 640 6670.40 6864.10 6801.06
131072 320 11229.11 11828.30 11651.15
262144 160 14239.53 14334.41 14297.16
524288 80 25720.76 26278.30 26028.27
1048576 40 46440.23 47101.22 46791.35
2097152 20 41616.45 43945.90 42856.80
4194304 10 62286.81 67221.62 64907.90
#----------------------------------------------------------------
# Benchmarking Reduce_scatter
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 1.15 1.17 1.15
4 1000 2.79 116.45 37.74
8 1000 2.88 146.80 28.30
16 1000 2.89 174.36 31.96
32 1000 3.23 302.12 45.65
64 1000 3.12 408.95 93.14
128 1000 3.07 576.51 232.83
256 1000 3.10 923.28 738.29
512 1000 1090.20 1092.25 1091.05
1024 1000 1212.27 1217.47 1214.91
2048 1000 1300.73 1306.18 1302.84
4096 1000 1474.26 1476.34 1475.58
8192 1000 1935.20 1936.17 1935.86
16384 1000 2562.77 2563.99 2563.63
32768 1000 3874.37 3876.27 3875.87
65536 640 73380.90 73387.62 73385.88
131072 320 350385.89 350418.67 350407.98
262144 160 12655.13 12828.11 12671.15
524288 80 23142.21 23554.27 23348.91
1048576 40 41440.35 41724.03 41584.49
2097152 20 58607.35 59571.41 59087.99
4194304 10 94975.40 100692.30 97813.92
#----------------------------------------------------------------
# Benchmarking Allgather
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.04 0.07 0.04
1 1000 545.25 545.80 545.57
2 1000 547.91 548.36 548.17
4 1000 551.94 552.33 552.10
8 1000 561.78 562.39 562.17
16 1000 575.88 576.23 576.08
32 1000 628.04 628.39 628.24
64 1000 752.22 752.61 752.39
128 1000 1294.73 1295.54 1295.22
256 1000 1586.71 1587.68 1587.17
512 1000 2452.51 2453.68 2453.00
1024 1000 3224.73 3226.33 3225.46
2048 1000 6266.51 6269.77 6268.09
4096 1000 9120.61 9124.76 9122.54
8192 1000 12218.66 12224.11 12221.36
16384 1000 17071.62 17074.78 17072.80
32768 1000 49022.41 49034.87 49028.81
65536 640 87679.30 87708.18 87689.16
131072 320 208738.13 208869.04 208797.07
262144 160 666085.23 668277.54 666775.03
524288 80 1022642.06 1027948.96 1025793.19
1048576 40 1756476.15 1765711.88 1760951.56
2097152 20 2952938.39 3003734.70 2978818.70
4194304 10 5375798.80 5563639.71 5463218.50
#----------------------------------------------------------------
# Benchmarking Allgatherv
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.62 0.66 0.63
1 1000 824.94 825.93 825.64
2 1000 816.39 817.29 817.12
4 1000 823.29 824.36 824.04
8 1000 850.35 851.38 851.11
16 1000 938.29 939.63 939.25
32 1000 3308.89 3313.03 3311.31
64 1000 6124.38 6131.41 6128.28
128 1000 8544.21 8553.72 8549.35
256 1000 10692.58 10704.33 10698.90
512 1000 15584.33 15601.42 15593.43
1024 1000 26275.79 26304.07 26290.89
2048 1000 42962.14 43008.67 42987.01
4096 1000 77686.79 77770.12 77731.56
8192 1000 152732.94 152891.89 152816.73
16384 1000 320425.46 320757.80 320600.28
32768 1000 631404.59 632062.11 631743.42
65536 640 1254384.65 1256389.27 1255416.87
131072 320 2487610.17 2495430.69 2491640.65
262144 160 4942483.65 4973375.58 4958312.07
524288 80 9811217.50 9934000.07 9873954.26
1048576 40 19361965.20 19851499.22 19611739.34
2097152 20 37751459.36 39706360.45 38748283.59
4194304 10 71663110.69 79475067.78 75644859.70
#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.04 0.07 0.04
1 1000 840.42 842.01 841.73
2 1000 835.52 836.01 835.80
4 1000 844.12 844.63 844.39
8 1000 861.18 861.77 861.44
16 1000 876.76 877.40 877.11
32 1000 1000.98 1001.49 1001.24
64 1000 1260.97 1261.64 1261.23
128 1000 1833.21 1834.15 1833.73
256 1000 3002.95 3005.31 3004.92
512 1000 3243.64 3247.10 3246.63
1024 1000 71991.33 71995.38 71994.29
2048 1000 139015.97 139024.61 139021.92
4096 1000 21338.94 21340.67 21339.93
8192 1000 34104.60 34107.67 34106.62
16384 1000 52820.66 52824.18 52823.20
32768 1000 104874.65 104882.13 104880.06
65536 640 276360.84 276382.19 276375.25
131072 320 484861.93 484902.27 484888.37
262144 160 921113.57 921191.36 921158.10
524288 80 1768213.71 1768460.08 1768363.28
1048576 40 4508935.68 4509846.20 4509394.44
2097152 20 9380809.45 9398098.46 9389680.55
4194304 10 15565499.50 15594503.59 15581532.16
#----------------------------------------------------------------
# Benchmarking Alltoallv
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.58 0.64 0.60
1 1000 2763.69 2766.87 2766.36
2 1000 2706.75 2709.65 2709.11
4 1000 2704.92 2708.35 2707.77
8 1000 2823.48 2826.69 2826.01
16 1000 2988.69 2991.50 2991.14
32 1000 2774.42 2777.30 2776.77
64 1000 2786.69 2789.71 2789.10
128 1000 3199.55 3202.31 3201.83
256 1000 2876.58 2880.09 2879.65
512 1000 55243.44 55249.64 55248.49
1024 1000 188625.26 188640.15 188636.72
2048 1000 187070.08 187289.79 187084.89
4096 1000 262043.66 262061.23 262056.45
8192 1000 359340.54 359370.18 359362.57
16384 1000 345624.49 345687.29 345655.10
32768 1000 418706.51 418986.49 418753.78
65536 640 1577051.93 1577752.86 1577348.59
131072 320 1792183.76 1794663.90 1793357.05
262144 160 2095284.36 2100250.26 2097504.22
524288 80 2629670.40 2636932.59 2633053.48
1048576 40 4407978.18 4443426.53 4422000.89
2097152 20 8467411.01 8555431.95 8533239.83
4194304 10 16534436.20 16844729.71 16787259.88
#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 80
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.06 0.10 0.07
1 1000 2559.63 2642.39 2577.71
2 1000 2577.59 2660.47 2592.78
4 1000 2541.65 2624.19 2561.87
8 1000 2538.07 2579.05 2556.78
16 1000 2540.35 2581.47 2558.62
32 1000 2539.32 2580.52 2557.10
64 1000 2539.23 2580.50 2555.27
128 1000 2539.12 2581.62 2553.66
256 1000 2585.36 2627.14 2588.85
512 1000 141.55 142.39 141.99
1024 1000 208.98 210.17 209.78
2048 1000 227.81 228.95 228.45
4096 1000 293.99 318.13 306.15
8192 1000 486.10 487.18 486.73
16384 1000 799.65 801.21 800.74
32768 1000 1483.64 1486.06 1485.42
65536 640 3192.47 3199.19 3197.68
131072 320 6314.70 6341.48 6335.46
262144 160 12469.44 12554.68 12532.73
524288 80 9770.92 10413.19 10318.85
1048576 40 18792.78 20762.40 20533.59
2097152 20 33849.45 42141.25 41535.32
4194304 10 65966.61 81472.99 79850.54
#---------------------------------------------------
# Benchmarking Barrier
# #processes = 80
#---------------------------------------------------
#repetitions t_min[usec] t_max[usec] t_avg[usec]
1000 750.75 770.57 763.23
-stephen
--
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
GMIT, Dublin Rd, Galway, Ireland. http://www.aplpi.com
More information about the Beowulf
mailing list