[Beowulf] WRF model on linux cluster: Mpi problem

Federico Ceccarelli federico.ceccarelli at techcom.it
Thu Jun 30 05:34:31 PDT 2005


Thanks for you answer Vincent,

my network cards are Intel Pro 1000, Gigabit.

Yes I did a 72h (real time) simulations that lasted 20h on 4 cpus...same
behaviour...

I'm thinking about a bandwith problem...

....maybe due to hardware failure of some network card, or switch (3com
-Baseline switch 2824).

Or the pci-raisers for the network card (I have a 2 unit rack so that I
cannot mount network cards directly on the pci slot)...

Did you experience problem with pci-raisers?

Can you suggest me a bandwidth benchmark?

thanks again...

federico
 
Il giorno gio, 30-06-2005 alle 12:44 +0200, Vincent Diepeveen ha
scritto:
> Hello Federico,
> 
> Hope you can find contacts to colleges.
> 
> A few questions.
>   a) what kind of interconnects does the cluster have (networkcards and
> which type?)
>   b) if you run a simulation that eats a few hours instead of a few seconds,
>      do you get the same speed outcome difference?
> 
> I see the program is pretty big for open source calculating software, about
> 1.9MB fortran code, so bit time consuming to figure out for someone who
> isn't a non-meteorological expert.
> 
> E:\wrf>dir *.f* /s /p
> ..
>      Total Files Listed:
>              141 File(s)      1,972,938 bytes
> 
> Best regards,
> Vincent
> 
> At 06:56 PM 6/29/2005 +0200, federico.ceccarelli wrote:
> >
> >Hi!
> >
> >I would like to get in touch with people running numerical meteorological
> >models  on a linux cluster (16cpu) , distributed memory (1Gb every node),
> >diskless nodes, Gigabit lan, mpich and openmosix.
> >
> >I'm tring to run WRF model but the mpi version parallelized on 4, 8, or 16
> >nodes runs slower than the single node one! It runs correctly but so slow...
> >
> >When I run wrf.exe on a single processor the cpu time for every timestep is
> >about 10s for my configuration.
> >
> >When I switch to np=4, 8 or 16 the cpu time for a single step sometimes its
> >faster (as It should always be, for example 3sec for 4 cpu ) but often it is
> >slower and slower (60sec and more!). The overall time of the simulation is
> >bigger than for the single node run...
> >
> >anyone have experienced the same problem?
> >
> >thanks in advance to everybody...
> >
> >federico
> >
> >
> >
> >Dr. Federico Ceccarelli (PhD)
> >-----------------------------
> >     TechCom snc
> >Via di Sottoripa 1-18
> >16124 Genova - Italia
> >Tel: +39 010 860 5664
> >Fax: +39 010 860 5691
> >http://www.techcom.it
> >
> >_______________________________________________
> >Beowulf mailing list, Beowulf at beowulf.org
> >To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> 




More information about the Beowulf mailing list