[Beowulf] Definition of HPC
Craig Tierney - NOAA Affiliate
craig.tierney at noaa.gov
Mon Apr 22 10:06:07 PDT 2013
On Thu, Apr 18, 2013 at 7:21 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
> Only for benchmarking? We have done this for years on our production
>> clusters (and SGI provides a tool this and more to clean up nodes). We
>> have this in our epilogue so that we can clean out memory on our diskless
>> nodes so there is nothing stale sitting around that can impact the next
>> users job.
> understood, but how did you decide that was actually a good thing?
Because it stopped the random out of memory conditions that we were having.
> if two jobs with similar file reference patterns run, for instance,
> drop_caches will cause quite a bit of additional IO delay.
For our workloads, this is a highly unlikely scenario because nodes are not
shared and the workload is very diverse, so for the next job to have any
connection to the previous job is negligible.
> I guess the rationale would also be much clearer for certain workloads,
> such as big-data reduction jobs, where things like executables would have
> to be re-fetched, but presumably much larger input data might never be
> re-referenced by following jobs. it would have to be jobs that have a lot
> of intra- but not inter-job readonly file re-reference,
> and where clean-page scavenging is a noticable cost.
> I'm guessing this may have been a much bigger deal on strongly NUMA
> machines of a certain era (high-memory ia64 SGI, older kernels).
> regards, mark.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf