[Beowulf] mem consumption strategy for HPC apps?
toon.knapen at fft.be
Wed Apr 13 23:59:34 PDT 2005
What is the ideal way to manage memory consumption in HPC applications?
For HPC applications, performance is everything. Next we all know about
the famous performance-memory tradeoff which says that performance can
be improved by consuming more memory and vice versa. Therefore HPC
applications want to consume all available memory.
But the performance-memory tradeoff as mentioned above supposes infinite
memory and infinite memory bandwith. Because memory if finite, consuming
more memory as physically available will result in swapping by the OS
and therefore a big performance hit. And since BW is also finite and
latency we have caching. But now we also need to be cautious not to
loose time due to cache trashing.
Knowing this we could say that HPC applications generally want to eat
all available memory but not more. All available memory here means all
physical memory minus the physical memory consumed by the system and its
basic services because we suppose that HPC applications do not share
their processor with other applications (to have the whole cache for
itself). Well this is true for single-processor machines. On multi-proc
machines (smp,numa) only a part of the physical memory can be consumed.
So because the application does not know how much physical memory it is
allowed to eat, it might be best that the user just specifies it when
launching the application.
But suppose now a single-proc machine has 8GB physical memory. Taking
into account that the OS and its services will never take more than
500MB, the user might say to the HPC application that it can eat up to
7.5GB of the physical memory.
But what does this number mean to the HPC application that is trying to
optimise its performance? Should it try to never consume more memory as
7.5GB or should it only try to consume never more as 7.5GB in intensive
loops (e.g. in the solver)? In the latter case, can we rely on the OS
swapping out the inactive parts of our application to make space for the
solver or would it be better that the application puts all
data-structures that are not used in the solver on disk to make sure?
OTOH if we want to limit the total memory consumption to 7.5GB, would it
be best to allocate a memory-pool of 7.5GB and if the pool is full abort
the application (after running for days)?
More information about the Beowulf