[Beowulf] Grid Engine multi-core thread binding enhancement -pre-alpha release

Rayson Ho raysonlogin at gmail.com
Mon Jul 11 13:34:34 PDT 2011


We are (beta) releasing a drop-in package for SGE6.2u5, SGE6.2u5p1,
and SGE6.2u5p2 for thread-binding:

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

Mainly tested on Intel boxes -- would be great if AMD Magny-Cours
server owners offer help with testing! (Play it safe -- setup a 1 or
2-node test cluster by using the non-standard SGE TCP ports).

Thanks!
Rayson



On Mon, Apr 18, 2011 at 2:26 PM, Rayson Ho <raysonlogin at gmail.com> wrote:
> For those who had issues with earlier version, please try the latest
> loadcheck v4:
>
> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>
> I compiled the binary on Oracle Linux, which is compatible with RHEL
> 5.x, Scientific Linux or Centos 5.x. I tested the binary on the
> standard Red Hat kernel, and Oracle enhanced "Unbreakable Enterprise
> Kernel", Fedora 13, Ubuntu 10.04 LTS.
>
> Optimizing for AMD's NUMA machine characteristics is on the ToDo list.
>
> Rayson
>
>
>
> On Wed, Apr 13, 2011 at 2:15 PM, Prakashan Korambath <ppk at ats.ucla.edu> wrote:
>> Hi Rayson,
>>
>> Do you have a statically linked version? Thanks.
>>
>> ./loadcheck: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by
>> ./loadcheck)
>>
>> Prakashan
>>
>>
>>
>> On 04/13/2011 09:21 AM, Rayson Ho wrote:
>>>
>>> Carlos,
>>>
>>> I notice that you have "lx24-amd64" instead of "lx26-amd64" for the
>>> arch string, so I believe you are running the loadcheck from standard
>>> Oracle Grid Engine, Sun Grid Engine, or one of the forks instead of
>>> the one from the Open Grid Scheduler page.
>>>
>>> The existing Grid Engine (including the latest Open Grid Scheduler
>>> releases: SGE 6.2u5p1&  SGE 6.2u5p2, or Univa's fork) uses PLPA, and
>>> it is known to be wrong on magny-cours.
>>>
>>> (i.e. SGE 6.2u5p1&  SGE 6.2u5p2 from:
>>> http://sourceforge.net/projects/gridscheduler/files/ )
>>>
>>>
>>> Chansup on the Grid Engine mailing list (it's the general purpose Grid
>>> Engine mailing list for now) tested the version I uploaded last night,
>>> and seems to work on a dual-socket magny-cours AMD machine. It prints:
>>>
>>> m_topology      SCCCCCCCCCCCCSCCCCCCCCCCCC
>>>
>>> However, I am still fixing the processor, core id mapping code:
>>>
>>> http://gridengine.org/pipermail/users/2011-April/000629.html
>>> http://gridengine.org/pipermail/users/2011-April/000628.html
>>>
>>> I compiled the hwloc enabled loadcheck on kernel 2.6.34&  glibc 2.12,
>>> so it may not work on machines running lower kernel or glibc versions,
>>> you can download it from:
>>>
>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>>
>>> Rayson
>>>
>>>
>>>
>>> On Wed, Apr 13, 2011 at 3:03 AM, Carlos Fernandez Sanchez
>>> <carlosf at cesga.es>  wrote:
>>>>
>>>> This is the output of a 2 sockets, 12 cores/socket (magny-cours) AMD
>>>> system
>>>> (and seems to be wrong!):
>>>>
>>>> arch            lx24-amd64
>>>> num_proc        24
>>>> m_socket        2
>>>> m_core          12
>>>> m_topology      SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT
>>>> load_short      0.29
>>>> load_medium     0.13
>>>> load_long       0.04
>>>> mem_free        26257.382812M
>>>> swap_free       8191.992188M
>>>> virtual_free    34449.375000M
>>>> mem_total       32238.328125M
>>>> swap_total      8191.992188M
>>>> virtual_total   40430.320312M
>>>> mem_used        5980.945312M
>>>> swap_used       0.000000M
>>>> virtual_used    5980.945312M
>>>> cpu             0.0%
>>>>
>>>>
>>>> Carlos Fernandez Sanchez
>>>> Systems Manager
>>>> CESGA
>>>> Avda. de Vigo s/n. Campus Vida
>>>> Tel.: (+34) 981569810, ext. 232
>>>> 15705 - Santiago de Compostela
>>>> SPAIN
>>>>
>>>> --------------------------------------------------
>>>> From: "Rayson Ho"<raysonlogin at gmail.com>
>>>> Sent: Tuesday, April 12, 2011 10:31 PM
>>>> To: "Beowulf List"<Beowulf at beowulf.org>
>>>> Subject: [Beowulf] Grid Engine multi-core thread binding enhancement
>>>> -pre-alpha release
>>>>
>>>>> If you are using the "Job to Core Binding" feature in SGE and running
>>>>> SGE on newer hardware, then please give the new hwloc enabled
>>>>> loadcheck a try.
>>>>>
>>>>> http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
>>>>>
>>>>> The current hardware topology discovery library (Portable Linux
>>>>> Processor Affinity - PLPA) used by SGE was deprecated in 2009, and new
>>>>> hardware topology may not be detected correctly by PLPA.
>>>>>
>>>>> If you are running SGE on AMD Magny-Cours servers, please post your
>>>>> loadcheck output, as it is known to be wrong when handled by PLPA.
>>>>>
>>>>> The Open Grid Scheduler is migrating to hwloc -- we will ship hwloc
>>>>> support in later releases of Grid Engine / Grid Scheduler.
>>>>>
>>>>> http://gridscheduler.sourceforge.net/
>>>>>
>>>>> Thanks!!
>>>>> Rayson
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>>
>>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>



More information about the Beowulf mailing list