[Beowulf] big read triggers migration and slow memory IO?

Fri Jul 10 07:59:23 PDT 2015

Every dog has its day! ;)

Prentice

On 07/09/2015 05:59 PM, James Cuff wrote:
> Awesome!!!
>
> With my job title most folks think I'm essentially technically 
> neutered these days.
>
> Good to see there is still some life in this old dog :-)
>
> Best,
>
> J.
>
> On Thursday, July 9, 2015, mathog <mathog at caltech.edu 
> <mailto:mathog at caltech.edu>> wrote:
>
>     On 09-Jul-2015 11:54, James Cuff wrote:
>
>         http://blog.jcuff.net/2015/04/of-huge-pages-and-huge-performance-hits.html
>
>
>     Well, that seems to be it, but not quite with the same symptoms
>     you observed.  khugepaged never showed up, and "perf top" never
>     revealed _spin_lock_irqsave.  Instead this is what "perf top"
>     shows in my tests:
>
>     (hugepage=always, when migration/# process observed)
>      89.97%  [kernel]       [k] compaction_alloc
>       1.21%  [kernel]       [k] compact_zone
>       1.18%  [kernel]       [k] get_pageblock_flags_group
>       0.75%  [kernel]       [k] __reset_isolation_suitable
>       0.57%  [kernel]       [k] clear_page_c_e
>
>     (hugepage=always, when events/# process observed)
>      85.97%  [kernel]       [k] compaction_alloc
>       0.84%  [kernel]       [k] compact_zone
>       0.65%  [kernel]       [k] get_pageblock_flags_group
>       0.64%  perf           [.] 0x000000000005cff7
>
>     (hugepage=never)
>      29.86%  [kernel]       [k] clear_page_c_e
>      21.88%  [kernel]       [k] copy_user_generic_string
>      12.46%  [kernel]       [k] __alloc_pages_nodemask
>       5.70%  [kernel]       [k] page_fault
>
>     This is good, because "perf top" shows that the underlying issue
>     is compaction_alloc and compact_zone even though what top shows
>     is in one case migration/# and when locked to a cpu, events/#.
>
>     Switching hugepage always->never seems to make things work right
>     away.  Switching hugepage never->always seems to take a while to
>     break.  In order to get it to start failing many of the big files
>     involved must be copied to /dev/null again, even though they were
>     presumably already in file cache.
>
>     Searched for "compaction_alloc" and "compact_zone" and found a
>     suggestion here
>
>     https://structureddata.github.io/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/
>
>     to do:
>
>     echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
>
>     (transparent_hugepage is a link to redhat_transparent_hugepage).
>     Reenabled hugepage and reproduced the painfully slow IO, set
>     defrag to "never" and the IO was fast again, even though hugepage
>     was still enabled.
>
>     So on my machine the problem seems to be with hugepage defrag
>     specifically.  Disabling just that is sufficient to resolve the
>     issue, it isn't necessary to take out all of hugepage. Will let
>     it run that way for a while and see if anything else shows up.
>
>     For future reference:
>
>     CentOS release 6.6 (Final)
>     kernel 2.6.32-504.23.4.el6.x86_64
>     Dell Inc. PowerEdge T620/03GCPM, BIOS 2.2.2 01/16/2014
>     48 Intel Xeon CPU E5-2695 v2 @ 2.40GHz  (in /proc/cpuinfo)
>     RAM 529231456 kB (in /proc/meminfo)
>
>     Thanks all!
>
>     David Mathog
>     mathog at caltech.edu
>     Manager, Sequence Analysis Facility, Biology Division, Caltech
>
>
>
> -- 
> (Via iPhone)
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150710/39886e46/attachment.html>