[Beowulf] Large Dell, odd IO delays

Kilian Cavalotti kilian.cavalotti.work at gmail.com
Wed Feb 14 15:44:01 PST 2018

On Wed, Feb 14, 2018 at 2:26 PM, David Mathog <mathog at caltech.edu> wrote:
> Checked the hugepage settings and found a difference there.  The two systems
> that don't do this have  /sys/kernel/mm/redhat_transparent_hugepage/defrag
> always madvise [never]
> whereas the system with the issue has:
> [always] madvise never

THP defragmentation is definitely something that has bitten us in the
past, when under memory pressure, and we now default to [madvise]
pretty much everywhere (we're too timid to disable it entirely).

A good way to see if that's really the issue is to "echo never >
/sys/kernel/mm/redhat_transparent_hugepage/defrag" while the problem
is happening, while simultaneously monitoring the processes with htop,
for instance.
It's usually pretty instant:  if the issue is really with THP defrag,
then CPU usage for your stalling process should drop pretty much
immediately and things go back to normal.


More information about the Beowulf mailing list