[Beowulf] Definition of HPC
Max R. Dechantsreiter
max at performancejones.com
Wed Apr 24 08:38:13 PDT 2013
On Wed, 24 Apr 2013, Mark Hahn wrote:
>> Sure, it's important - WITHIN a given job. Why should a
>> new job's performance depend on what ran before? (And in
>
> I can't see why anyone would want to throw away a performance improvement.
>
>> most cases, the impact is negative, because the cached
>> pages are not the ones needed by the new job.)
>
> prove it. scavenging clean pages is one of the most common
> kernel paths, constantly being used in normal operation.
> I see no reason to expect that scavenging overhead is noticable,
> and especially that bulk reclamation is significantly faster than
> incremental.
Test it yourself.
> I guess that's a useful point to make here: drop_caches is not doing
> anything different than the kernel normally does. it's just doing the same
> thing, but in bulk and blindly, including pages that really
> will be used again and shouldn't be dropped.
>
> if the claim is that drop_caches creates less cpu cache pollution than
> the normal incremental scavenging, well, that would be interesting to see the
> numbers for. certainly possible, but would be surprising
> given the overhead for system calls in the first place, and given
> that the user-space codepath is, after all, *IO*, which tends to be fairly
> cpu-cache-unfriendly in the first place.
TLB misses.
>>> for sites where a single job is rolled onto all nodes and runs for a long
>>> time, then is entirely removed, sure, it may make sense. rebooting
>>> entirely
>>> might even work better. I'm mainly concerned with clusters which run a
>>> wide mixture of jobs, probably with multiple jobs sharing a node at times.
>>
>> I would advise any user never to do that.
>
> don't be silly: nodes are fat, and there is waste for many workloads
Be civil.
> if you only allocate full nodes. this is a decision that must be made
> based on your workload mixture. my organization (like MANY others),
> handles very disparate workloads, and cannot easily switch to unshared
> nodes.
Do you run a scheduler? Any user ought to be able to specify exclusivity.
> nodes are also not getting thinner.
>
>>> who says determinism is a good thing? I assume, for instance, you turn
>>> off
>>> your CPU caches to obtain determinism, right? I'm not claiming that
>>> variance
>>> is good, but why do you assume that the normal functioning of the
>>> pagecache
>>> will cause it?
>>
>> Try it and see.
>
> are you just hecking, or do you have some measurements to contribute?
You show me yours, and maybe I'd show you mine.
Max
More information about the Beowulf
mailing list