[Beowulf] Large Dell, odd IO delays

Wed Feb 14 23:04:07 PST 2018

Hmmm...  I will also chip in with my favourite tip
Look at the sysctl for min_free_kbytes    It is often set very low.
Increase this substantially. It will do no harm to your system (unless you
set it ti an absurd value!)

You should be looking at the vm dirty ratios etc. also

On 15 February 2018 at 00:44, Kilian Cavalotti <
kilian.cavalotti.work at gmail.com> wrote:

> On Wed, Feb 14, 2018 at 2:26 PM, David Mathog <mathog at caltech.edu> wrote:
> > Checked the hugepage settings and found a difference there.  The two
> systems
> > that don't do this have  /sys/kernel/mm/redhat_
> transparent_hugepage/defrag
> >
> > always madvise [never]
> >
> > whereas the system with the issue has:
> >
> > [always] madvise never
>
> THP defragmentation is definitely something that has bitten us in the
> past, when under memory pressure, and we now default to [madvise]
> pretty much everywhere (we're too timid to disable it entirely).
>
> A good way to see if that's really the issue is to "echo never >
> /sys/kernel/mm/redhat_transparent_hugepage/defrag" while the problem
> is happening, while simultaneously monitoring the processes with htop,
> for instance.
> It's usually pretty instant:  if the issue is really with THP defrag,
> then CPU usage for your stalling process should drop pretty much
> immediately and things go back to normal.
>
> Cheers,
> --
> Kilian
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180215/57ecd33b/attachment.html>