[Beowulf] WRF model on linux cluster: Mpi problem

Vincent Diepeveen diep at xs4all.nl
Thu Jun 30 05:52:09 PDT 2005


At 02:34 PM 6/30/2005 +0200, Federico Ceccarelli wrote:
>
>Thanks for you answer Vincent,
>
>my network cards are Intel Pro 1000, Gigabit.
>
>Yes I did a 72h (real time) simulations that lasted 20h on 4 cpus...same
>behaviour...
>
>I'm thinking about a bandwith problem...
>
>....maybe due to hardware failure of some network card, or switch (3com
>-Baseline switch 2824).
>
>Or the pci-raisers for the network card (I have a 2 unit rack so that I
>cannot mount network cards directly on the pci slot)...

because the gigabit cards have such horrible one way ping pong latencies as
compared to the highend cards (myri,dolphin,quadrics and relative seen also
infiniband), the pci bus is not your biggest problem which is the case here.

The specifications of the card are so so so restricted that the pci is not
the problem at all.

There are many tests out there to test things. You should try some one-way
pingpong test. 

By the way, the reason for me to not run openmosix nor similar single image
software systems is because it has such ugly effect at the latencies and
the way it pages shared memory communication between nodes is real ugly
slow and bad for this type of software. There is also something called
OpenSSI which is pretty active getting developed. It has the same problem.

Vincent

>Did you experience problem with pci-raisers?
>
>Can you suggest me a bandwidth benchmark?
>
>thanks again...
>
>federico
> 
>Il giorno gio, 30-06-2005 alle 12:44 +0200, Vincent Diepeveen ha
>scritto:
>> Hello Federico,
>> 
>> Hope you can find contacts to colleges.
>> 
>> A few questions.
>>   a) what kind of interconnects does the cluster have (networkcards and
>> which type?)
>>   b) if you run a simulation that eats a few hours instead of a few
seconds,
>>      do you get the same speed outcome difference?
>> 
>> I see the program is pretty big for open source calculating software, about
>> 1.9MB fortran code, so bit time consuming to figure out for someone who
>> isn't a non-meteorological expert.
>> 
>> E:\wrf>dir *.f* /s /p
>> ..
>>      Total Files Listed:
>>              141 File(s)      1,972,938 bytes
>> 
>> Best regards,
>> Vincent
>> 
>> At 06:56 PM 6/29/2005 +0200, federico.ceccarelli wrote:
>> >
>> >Hi!
>> >
>> >I would like to get in touch with people running numerical meteorological
>> >models  on a linux cluster (16cpu) , distributed memory (1Gb every node),
>> >diskless nodes, Gigabit lan, mpich and openmosix.
>> >
>> >I'm tring to run WRF model but the mpi version parallelized on 4, 8, or 16
>> >nodes runs slower than the single node one! It runs correctly but so
slow...
>> >
>> >When I run wrf.exe on a single processor the cpu time for every
timestep is
>> >about 10s for my configuration.
>> >
>> >When I switch to np=4, 8 or 16 the cpu time for a single step sometimes
its
>> >faster (as It should always be, for example 3sec for 4 cpu ) but often
it is
>> >slower and slower (60sec and more!). The overall time of the simulation is
>> >bigger than for the single node run...
>> >
>> >anyone have experienced the same problem?
>> >
>> >thanks in advance to everybody...
>> >
>> >federico
>> >
>> >
>> >
>> >Dr. Federico Ceccarelli (PhD)
>> >-----------------------------
>> >     TechCom snc
>> >Via di Sottoripa 1-18
>> >16124 Genova - Italia
>> >Tel: +39 010 860 5664
>> >Fax: +39 010 860 5691
>> >http://www.techcom.it
>> >
>> >_______________________________________________
>> >Beowulf mailing list, Beowulf at beowulf.org
>> >To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>> >
>> >
>> 
>
>
>



More information about the Beowulf mailing list