[Beowulf] Definition of HPC

Wed Apr 24 08:38:13 PDT 2013

On Wed, 24 Apr 2013, Mark Hahn wrote:

>> Sure, it's important - WITHIN a given job.  Why should a
>> new job's performance depend on what ran before?  (And in
>
> I can't see why anyone would want to throw away a performance improvement.
>
>> most cases, the impact is negative, because the cached
>> pages are not the ones needed by the new job.)
>
> prove it.  scavenging clean pages is one of the most common
> kernel paths, constantly being used in normal operation.
> I see no reason to expect that scavenging overhead is noticable,
> and especially that bulk reclamation is significantly faster than
> incremental.

Test it yourself.

> I guess that's a useful point to make here: drop_caches is not doing
> anything different than the kernel normally does.  it's just doing the same 
> thing, but in bulk and blindly, including pages that really
> will be used again and shouldn't be dropped.
>
> if the claim is that drop_caches creates less cpu cache pollution than
> the normal incremental scavenging, well, that would be interesting to see the 
> numbers for.  certainly possible, but would be surprising
> given the overhead for system calls in the first place, and given
> that the user-space codepath is, after all, *IO*, which tends to be fairly 
> cpu-cache-unfriendly in the first place.

TLB misses.

>>> for sites where a single job is rolled onto all nodes and runs for a long
>>> time, then is entirely removed, sure, it may make sense.  rebooting 
>>> entirely
>>> might even work better.  I'm mainly concerned with clusters which run a
>>> wide mixture of jobs, probably with multiple jobs sharing a node at times.
>> 
>> I would advise any user never to do that.
>
> don't be silly: nodes are fat, and there is waste for many workloads

Be civil.

> if you only allocate full nodes.  this is a decision that must be made
> based on your workload mixture.  my organization (like MANY others),
> handles very disparate workloads, and cannot easily switch to unshared
> nodes.

Do you run a scheduler?  Any user ought to be able to specify exclusivity.

> nodes are also not getting thinner.
>
>>> who says determinism is a good thing?  I assume, for instance, you turn 
>>> off
>>> your CPU caches to obtain determinism, right?  I'm not claiming that 
>>> variance
>>> is good, but why do you assume that the normal functioning of the 
>>> pagecache
>>> will cause it?
>> 
>> Try it and see.
>
> are you just hecking, or do you have some measurements to contribute?

You show me yours, and maybe I'd show you mine.

Max