<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Every dog has its day! ;) <br>

    <pre class="moz-signature" cols="72">Prentice</pre>

    <div class="moz-cite-prefix">On 07/09/2015 05:59 PM, James Cuff

      wrote:<br>

    </div>

    <blockquote

cite="mid:CADTFW2VLypuaPtS2QDfq09KHCzvhUKDsYi=bSkN614PE_e2BBw@mail.gmail.com"

      type="cite">Awesome!!!

      <div><br>

      </div>

      <div>With my job title most folks think I'm essentially

        technically neutered these days.  </div>

      <div><br>

      </div>

      <div>Good to see there is still some life in this old dog :-)<br>

        <br>

        Best,</div>

      <div><br>

      </div>

      <div>J. </div>

      <div><br>

        On Thursday, July 9, 2015, mathog <<a moz-do-not-send="true"

          href="mailto:mathog@caltech.edu">mathog@caltech.edu</a>>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">On

          09-Jul-2015 11:54, James Cuff wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <a moz-do-not-send="true"

href="http://blog.jcuff.net/2015/04/of-huge-pages-and-huge-performance-hits.html"

              target="_blank">http://blog.jcuff.net/2015/04/of-huge-pages-and-huge-performance-hits.html</a><br>

          </blockquote>

          <br>

          Well, that seems to be it, but not quite with the same

          symptoms you observed.  khugepaged never showed up, and "perf

          top" never revealed _spin_lock_irqsave.  Instead this is what

          "perf top" shows in my tests:<br>

          <br>

          (hugepage=always, when migration/# process observed)<br>

           89.97%  [kernel]       [k] compaction_alloc<br>

            1.21%  [kernel]       [k] compact_zone<br>

            1.18%  [kernel]       [k] get_pageblock_flags_group<br>

            0.75%  [kernel]       [k] __reset_isolation_suitable<br>

            0.57%  [kernel]       [k] clear_page_c_e<br>

          <br>

          (hugepage=always, when events/# process observed)<br>

           85.97%  [kernel]       [k] compaction_alloc<br>

            0.84%  [kernel]       [k] compact_zone<br>

            0.65%  [kernel]       [k] get_pageblock_flags_group<br>

            0.64%  perf           [.] 0x000000000005cff7<br>

          <br>

          (hugepage=never)<br>

           29.86%  [kernel]       [k] clear_page_c_e<br>

           21.88%  [kernel]       [k] copy_user_generic_string<br>

           12.46%  [kernel]       [k] __alloc_pages_nodemask<br>

            5.70%  [kernel]       [k] page_fault<br>

          <br>

          This is good, because "perf top" shows that the underlying

          issue<br>

          is compaction_alloc and compact_zone even though what top

          shows<br>

          is in one case migration/# and when locked to a cpu, events/#.<br>

          <br>

          Switching hugepage always->never seems to make things work

          right away.  Switching hugepage never->always seems to take

          a while to break.  In order to get it to start failing many of

          the big files involved must be copied to /dev/null again, even

          though they were presumably already in file cache.<br>

          <br>

          Searched for "compaction_alloc" and "compact_zone" and found a

          suggestion here<br>

          <br>

          <a moz-do-not-send="true"

href="https://structureddata.github.io/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/"

            target="_blank">https://structureddata.github.io/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/</a><br>

          <br>

          to do:<br>

          <br>

          echo never >

          /sys/kernel/mm/redhat_transparent_hugepage/defrag<br>

          <br>

          (transparent_hugepage is a link to

          redhat_transparent_hugepage).<br>

          Reenabled hugepage and reproduced the painfully slow IO, set

          defrag to "never" and the IO was fast again, even though

          hugepage was still enabled.<br>

          <br>

          So on my machine the problem seems to be with hugepage defrag

          specifically.  Disabling just that is sufficient to resolve

          the issue, it isn't necessary to take out all of hugepage. 

          Will let<br>

          it run that way for a while and see if anything else shows up.<br>

          <br>

          For future reference:<br>

          <br>

          CentOS release 6.6 (Final)<br>

          kernel 2.6.32-504.23.4.el6.x86_64<br>

          Dell Inc. PowerEdge T620/03GCPM, BIOS 2.2.2 01/16/2014<br>

          48 Intel Xeon CPU E5-2695 v2 @ 2.40GHz  (in /proc/cpuinfo)<br>

          RAM 529231456 kB (in /proc/meminfo)<br>

          <br>

          Thanks all!<br>

          <br>

          David Mathog<br>

          <a moz-do-not-send="true">mathog@caltech.edu</a><br>

          Manager, Sequence Analysis Facility, Biology Division, Caltech<br>

        </blockquote>

      </div>

      <br>

      <br>

      -- <br>

      (Via iPhone)<br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing

To change your subscription (digest mode or unsubscribe) visit <a class="moz-txt-link-freetext" href="http://www.beowulf.org/mailman/listinfo/beowulf">http://www.beowulf.org/mailman/listinfo/beowulf</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>