Beowulf & Fluid Mechanics
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at icase.eduMon Jul 17 13:05:43 PDT 2000
- Previous message: Beowulf & Fluid Mechanics
- Next message: Beowulf & Fluid Mechanics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg Lindahl wrote: > > > To me, the most interesting conclusions based on Brian's tests concern > > MPI implementationa. > > However, the MVICH result is more odd. MVICH doesn't have to go through the > kernel to talk to Giganet, right? And as a point of comparison, Myrinet on a > dual Alpha has an smp penalty of as little as 3% on some of my real > applications (the FSL weather codes). Myrinet and Giganet are pretty much > the same, from the programmer point of view. The MVICH we had was version 0.03, based on MPICH 1.1.2. Thanks to VIA, MVICH avoids a buffer copy but it still needs to talk to the kernel at times (I think). Comparing MVICH to MPI/Pro suggests that the method of accessing hardware matters a great deal (at least on Intel platforms). Even with Fast Ethernet, both LAM and MPICH impose about 25% performance penalty on SMP nodes. With faster networks (e.g. Giganet) this penalty grows to about 40% (MVICH). In both cases, MPI/Pro has virtually *no* SMP performance penalty, but MPI/Pro latency figures are poor compared to LAM/MPICH/MVICH. This is very odd, and it suggests that while polling produces low latency, it does so at the expense of wasting a significant portion of CPU cycles. > Obviously I should get a copy of Brian's test before I make the bald-faced > claim that I'm about to make, but: perhaps the SMP penalty you're seeing > from MVICH comes from the fact that it's beating on main memory or the PCI > bus in a tight loop. If it instead did a little in-processor busy loop to > not poll more than once every few microseconds, the main memory or PCI > traffic would be significantly lessened, but the latency wouldn't change > much. You may be right, but improving memory or PCI bandwidth need not make a sufficiently huge difference. Some other bottleneck may be involved (e.g. a single threaded portion of the kernel). > SMP effects are an extremely interesting swamp to dive into. I know that > Compaq SC benchmarks (4-proc Alphas with Quadrics -- but only 2.5 cpus worth > of main memory bandwidth) can show some *really* interesting performance > losses for multiple CPUs. I have tried to avoid multi-processor machines for > this reason, but the 2nd Intel cpu is so cheap that it's hard to dodge using > them. Yup. However, on our machines the price advantage of SMP is only 11% while performance penalty is at least 25% (using LAM and Fast Ethernet). This would favor uniprocessor nodes, which are easier to administer and more robust anyway, but unfortunately we do not have enough space in our computer room for so many boxes. We are buying more SMP nodes because they pack more CPUs into the same space... Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip at icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
- Previous message: Beowulf & Fluid Mechanics
- Next message: Beowulf & Fluid Mechanics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
