[Beowulf] mem consumption strategy for HPC apps?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Apr 18 04:13:44 PDT 2005
- Previous message: [Beowulf] mem consumption strategy for HPC apps?
- Next message: [Beowulf] mem consumption strategy for HPC apps?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 17 Apr 2005, Toon Knapen wrote: > Mark Hahn wrote: > >>What is the ideal way to manage memory consumption in HPC applications? > > > > > > run the right jobs on the right machines. > > > > But because memory is scarce one needs to have a good memory consumption > strategy. And memory is scarce otherwise out-of-core solvers (like for > instance used in NASTRAN) would not be necessary. .... > direct solvers treating 1 million dofs (and a decent bandwith of course) > need a _lot_ of memory. Thus: out-of-core solvers are necessary. .... > I'm assuming at least 1G also. But most have 4G per node and up. But > again this is for direct solvers of big systems. .... > The question is just: any out-of-core uses blocking and treats block per > block. But how big should blocks ideally be. Can I take a block-size > that is almost equal to my physical memory and thus relying on the rest > of the app being swapped out (taking into account that bigger block size > improves performance)? .... > but a typical timeslice is much shorter than 100 seconds. Additionally > you're not taking into account the time you loose the time you need to > rebuild your cache. .... > OK, thanks. This was one of my main questions. So as you said before: > the OS swapping an HPC app is a non-fatal error. These responses are not inconsistent with Mark's or Greg's remarks. Let me reiterate: a) The "best" thing to do is to match your job to your node's capabilities or vice versa so it runs in core when split up. This is just ONE of MANY "best practice" or "design constraint" issues to be faced though, as it is equally important to match up networking requirements, memory access speeds, and a lot more. There is some sense in the HPC community of how to go about doing these things, and fortunately for MOST cluster owner/builders, it isn't impossibly difficult to achieve a profitable operating regime (near linear speedup in number of nodes) if not an optimal one. b) IF your job is TOO BIG to fit in core on the nodes, then there IS NO "BEST PRACTICE". There is only a choice. Either: Scale your job back so it fits in core. Seriously. Most of us who run things that COULD fill a universe of RAM recognize that we just plain have to live in the real world and run things at a size that fits. Fortunately we live in a Moore's Law inflationary universe, so one can gradually do more. or Bite the bullet and do all the very hard work required to make your job run efficiently with whatever hardware you have at the scale you desire. As you obviously recognize, this is not easy and involves knowing lots of things about your system. Ultimately it comes down to partitioning your job so that it runs IN core again, whether it does so by using the disk directly, by letting the VM subsystem manage the in core/out of core swapping/paging for you, by splitting the job up across N nodes (a common enough solution) so that its partitioned pieces fit in core on each node and relying on IPCs instead of memory reads from disk. Really that's it. And in the second of these cases although people may be able to help you with SPECIFIC issues you encounter, it is pointless to discuss the "general case" because their ain't no such animal. Solutions for your problem are likely very different from a solution for somebody partitioning a lattice which might be different from somebody partitioning a large volume filled with varying number and position of particles. An efficient solution is likely going to be expensive and require learning a lot about the operating system, the various parallel programming paradigms and available support libraries, the compute, memory and network hardware, and may require some nonlinear thinking, as there exist projects such as trapeze to consider (using other nodes' RAM as a virtual extension of your own, paying the ~5 us latency hit for a modern network plus a memory access hit, but saving BIG time over a ~ms scale disk latency hit). Anyway, good luck. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] mem consumption strategy for HPC apps?
- Next message: [Beowulf] mem consumption strategy for HPC apps?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
