[Beowulf] [External] Re: Intel Cluster Checker
Prentice Bisbal
pbisbal at pppl.gov
Thu Apr 30 11:49:41 PDT 2020
When you launch your clck jobs, do you launch them with slurm, or do you
use a nodefile? When I use a nodefile, I get an error that it can't call
mpirun on one of the nodes, or something like that. I'd provide the
exact error message, but I don't have access to it at the moment.
Prentice
On 4/30/20 11:49 AM, Black, Brady P wrote:
> Hi - Intel Cluster Checker person chiming in.
>
> To answer your question Prentice about runtime of Cluster Checker (CLCK), this will depend on which set of tests or framework definition (FWD) you use and the number of servers. The default fwd, is health_base which should run in a matter of seconds. It was designed to run quickly and be a sanity check before running jobs. Other FWDs are designed for cluster hand-off and validation; so these will take much longer as they run a multitude of different benchmarks on individual nodes (stream/dgemm/sgemm/...) and across the cluster (hpcg/hpl/pairwise imb/...) looking for outliers. Which can take 90+ minutes to multiple hours depending on the system configuration and size. Of course there are inbetween tests also such as health_extended_user or mpi_prereq_user.
>
> Couple of tips - clck -X list is a great way to see what framework definitions exist. clck -X <name_of_fwd> will give you more details on what is being checked for the specific fwd.
>
> Thanks for using cluster checker and providing feedback. Happy to help further.
>
> -Brady
>
>> -----Original Message-----
>> From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of Michael Di
>> Domenico
>> Sent: Thursday, April 30, 2020 10:23
>> Cc: Beowulf Mailing List <beowulf at beowulf.org>
>> Subject: Re: [Beowulf] Intel Cluster Checker
>>
>> i played with it about a year ago since i get it as part of the intel compiler
>> bundle we pay for. it was overly complicated to install and run and didn't
>> seem worth while. kind of like getting a piece of ikea furniture but then
>> trying to use a phillips screw driver to build it instead of the little wrench.
>> otherwise when i dug into what it was actually doing, it didn't seem to be
>> doing anything magical. it was just doing it 'the intel way', which in my
>> experience is generally very strange
>>
>>
>>
>> On Wed, Apr 29, 2020 at 4:07 PM Prentice Bisbal via Beowulf
>> <beowulf at beowulf.org> wrote:
>>> Beowulfers,
>>>
>>> Have any of you used the Intel Cluster Checker? I've been tasked with
>>> using it, and I think I have it running, but the documentation isn't
>>> very good. I was wondering how long a typical run on some cluster
>>> nodes should take.
>>>
>>> Prentice
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing To change your subscription (digest mode or unsubscribe)
>>> visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
More information about the Beowulf
mailing list