[Beowulf] WRF model on linux cluster: Mpi problem
Federico Ceccarelli
federico.ceccarelli at techcom.it
Mon Jul 4 12:13:39 PDT 2005
Hi,
I did the Pallas benchmark...after removing openmosix...here are the
ping-pong and ping-ping results...for 2 processes
What do you think about them?
Why the bandwidth is raising and decreasing many times as the #bytes
grow?
thanks again...
federico
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V2.3, MPI-1 part
#---------------------------------------------------
# Date : Mon Jul 4 15:20:32 2005
# Machine : i686# System : Linux
# Release : 2.4.26-om1
# Version : #3 mer feb 23 04:32:26 CET 2005
#
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Alltoall
# Bcast
# Barrier
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 109.00 0.00
1 1000 109.43 0.01
2 1000 138.81 0.01
4 1000 238.29 0.02
8 1000 246.77 0.03
16 1000 246.26 0.06
32 1000 273.79 0.11
64 1000 250.73 0.24
128 1000 250.98 0.49
256 1000 250.73 0.97
512 1000 250.74 1.95
1024 1000 250.23 3.90
2048 1000 251.99 7.75
4096 1000 256.01 15.26
8192 1000 500.27 15.62
16384 1000 785.51 19.89
32768 1000 15087.75 2.07
65536 640 33256.60 1.88
131072 320 5399.92 23.15
262144 160 95577.23 2.62
524288 80 102396.36 4.88
1048576 40 529898.21 1.89
2097152 20 89600.72 22.32
4194304 10 794578.55 5.03
#---------------------------------------------------
# Benchmarking PingPing
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 94.45 0.00
1 1000 94.92 0.01
2 1000 94.07 0.02
4 1000 95.82 0.04
8 1000 95.33 0.08
16 1000 105.89 0.14
32 1000 117.57 0.26
64 1000 120.45 0.51
128 1000 124.39 0.98
256 1000 136.02 1.79
512 1000 171.28 2.85
1024 1000 185.80 5.26
2048 1000 238.80 8.18
4096 1000 256.54 15.23
8192 1000 381.98 20.45
16384 1000 13932.86 1.12
32768 1000 42027.47 0.74
65536 640 45166.66 1.38
131072 320 9002.89 13.88
262144 160 194274.79 1.29
524288 80 773914.26 0.65
1048576 40 85866.48 11.65
2097152 20 839526.30 2.38
4194304 10 310144.00 12.90
Il giorno lun, 04-07-2005 alle 08:48 +0100, John Hearns ha scritto:
> On Fri, 2005-07-01 at 09:38 +0200, Federico Ceccarelli wrote:
> > yeas,
> >
> > I will remove openmosix.
> > I patched the kernel with openmosix because I used the cluster also for
> > other smaller applications, so the load balance was useful to me.
> >
> > I already tried to switch off openmosix with
> >
> > > service openmosix stop
> Having a small amount of Openmosix experience, that should work.
>
> Have you used the little graphical tool to display the loads on each
> node? (can't remember the name).
>
> Anyway, I go along with the earlier advice to look at the network card
> performance.
> Do an lspci -vv on all nodes to check that your riser cards are running
> at full speed.
>
> What I would do is break this problem down.
> Start by running the Pallas benchmark, on one node, then two, then four
> etc. See if a pattern develops.
> The same with your model, if it is possible to cut down the problem
> size. Run on one node (two processors), then two then four.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list