[Beowulf] Definition of HPC

Mon Apr 22 10:06:07 PDT 2013

On Thu, Apr 18, 2013 at 7:21 PM, Mark Hahn <hahn at mcmaster.ca> wrote:

> Only for benchmarking?  We have done this for years on our production
>> clusters (and SGI provides a tool this and more to clean up nodes).  We
>> have this in our epilogue so that we can clean out memory on our diskless
>> nodes so there is nothing stale sitting around that can impact the next
>> users job.
>>
>
> understood, but how did you decide that was actually a good thing?
>
>
Mark,

Because it stopped the random out of memory conditions that we were having.

> if two jobs with similar file reference patterns run, for instance,
> drop_caches will cause quite a bit of additional IO delay.
>
>
For our workloads, this is a highly unlikely scenario because nodes are not
shared and the workload is very diverse, so for the next job to have any
connection to the previous job is negligible.

Craig

> I guess the rationale would also be much clearer for certain workloads,
> such as big-data reduction jobs, where things like executables would have
> to be re-fetched, but presumably much larger input data might never be
> re-referenced by following jobs.  it would have to be jobs that have a lot
> of intra- but not inter-job readonly file re-reference,
> and where clean-page scavenging is a noticable cost.
>
> I'm guessing this may have been a much bigger deal on strongly NUMA
> machines of a certain era (high-memory ia64 SGI, older kernels).
>
> regards, mark.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20130422/aea966c4/attachment.html>