[Beowulf] WRF model on linux cluster: Mpi problem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Michael Will mwill at penguincomputing.comThu Jun 30 12:10:38 PDT 2005
- Previous message: [Beowulf] WRF model on linux cluster: Mpi problem
- Next message: [Beowulf] more news on the Cell
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Vincent is on target here: If your application already uses MPI as a middleware assuming distributed memory, then you should definitly use a beowulf style setup rather than openmosix with it's pseudo-shared memory model. Look at rocks 4.0.0 http://www.rocksclusters.org/Rocks/ which is free and based on CentOS 4 which again is a free version of RHEL4. Michael Vincent Diepeveen wrote: >At 02:34 PM 6/30/2005 +0200, Federico Ceccarelli wrote: > > >>Thanks for you answer Vincent, >> >>my network cards are Intel Pro 1000, Gigabit. >> >>Yes I did a 72h (real time) simulations that lasted 20h on 4 cpus...same >>behaviour... >> >>I'm thinking about a bandwith problem... >> >>....maybe due to hardware failure of some network card, or switch (3com >>-Baseline switch 2824). >> >>Or the pci-raisers for the network card (I have a 2 unit rack so that I >>cannot mount network cards directly on the pci slot)... >> >> > >because the gigabit cards have such horrible one way ping pong latencies as >compared to the highend cards (myri,dolphin,quadrics and relative seen also >infiniband), the pci bus is not your biggest problem which is the case here. > >The specifications of the card are so so so restricted that the pci is not >the problem at all. > >There are many tests out there to test things. You should try some one-way >pingpong test. > >By the way, the reason for me to not run openmosix nor similar single image >software systems is because it has such ugly effect at the latencies and >the way it pages shared memory communication between nodes is real ugly >slow and bad for this type of software. There is also something called >OpenSSI which is pretty active getting developed. It has the same problem. > >Vincent > > > >>Did you experience problem with pci-raisers? >> >>Can you suggest me a bandwidth benchmark? >> >>thanks again... >> >>federico >> >>Il giorno gio, 30-06-2005 alle 12:44 +0200, Vincent Diepeveen ha >>scritto: >> >> >>>Hello Federico, >>> >>>Hope you can find contacts to colleges. >>> >>>A few questions. >>> a) what kind of interconnects does the cluster have (networkcards and >>>which type?) >>> b) if you run a simulation that eats a few hours instead of a few >>> >>> >seconds, > > >>> do you get the same speed outcome difference? >>> >>>I see the program is pretty big for open source calculating software, about >>>1.9MB fortran code, so bit time consuming to figure out for someone who >>>isn't a non-meteorological expert. >>> >>>E:\wrf>dir *.f* /s /p >>>.. >>> Total Files Listed: >>> 141 File(s) 1,972,938 bytes >>> >>>Best regards, >>>Vincent >>> >>>At 06:56 PM 6/29/2005 +0200, federico.ceccarelli wrote: >>> >>> >>>>Hi! >>>> >>>>I would like to get in touch with people running numerical meteorological >>>>models on a linux cluster (16cpu) , distributed memory (1Gb every node), >>>>diskless nodes, Gigabit lan, mpich and openmosix. >>>> >>>>I'm tring to run WRF model but the mpi version parallelized on 4, 8, or 16 >>>>nodes runs slower than the single node one! It runs correctly but so >>>> >>>> >slow... > > >>>>When I run wrf.exe on a single processor the cpu time for every >>>> >>>> >timestep is > > >>>>about 10s for my configuration. >>>> >>>>When I switch to np=4, 8 or 16 the cpu time for a single step sometimes >>>> >>>> >its > > >>>>faster (as It should always be, for example 3sec for 4 cpu ) but often >>>> >>>> >it is > > >>>>slower and slower (60sec and more!). The overall time of the simulation is >>>>bigger than for the single node run... >>>> >>>>anyone have experienced the same problem? >>>> >>>>thanks in advance to everybody... >>>> >>>>federico >>>> >>>> >>>> >>>>Dr. Federico Ceccarelli (PhD) >>>>----------------------------- >>>> TechCom snc >>>>Via di Sottoripa 1-18 >>>>16124 Genova - Italia >>>>Tel: +39 010 860 5664 >>>>Fax: +39 010 860 5691 >>>>http://www.techcom.it >>>> >>>>_______________________________________________ >>>>Beowulf mailing list, Beowulf at beowulf.org >>>>To change your subscription (digest mode or unsubscribe) visit >>>> >>>> >>>http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >>>> >>>> >> >> >> >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Michael Will Penguin Computing Corp. Sales Engineer 415-954-2887 415-954-2899 fx mwill at penguincomputing.com
- Previous message: [Beowulf] WRF model on linux cluster: Mpi problem
- Next message: [Beowulf] more news on the Cell
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
