[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

John Hearns hearnsj at googlemail.com
Mon Jun 11 05:36:14 PDT 2018


Skylar Thomson wrote:
>Unfortunately we don't have a mechanism to limit
>network usage or local scratch usage, but the former is becoming less of a
>problem with faster edge networking, and we have an opt-in bookkeeping
mechanism
>for the latter that isn't enforced but works well enough to keep people
happy.
That is interesting to me. At ASML I worked on setting up Quality of
Service, ie bandwidth limits, for GPFS storage and MPI traffic.
GPFS does have QoS limits inbuilt, but these are intended to limit the
backgrouns housekeeping tasks rather than to limit user processes.
But it does have the concept.
With MPI you can configure different QoS levels for different traffic.

More relevently I did have a close discussion with Parav Pandit who is
working on the network QoS stuff.
I am sure there is something more up to date than this
https://www.openfabrics.org/images/eventpresos/2016presentations/115rdmacont.pdf
Sadly this RDMA stuff needs a recent 4-series kernel. I guess the
discussion on whether or not you should go with a bleeding edge kernel is
for another time!
But yes cgroups have configurable network limits with the latest kernels.

Also being cheeky, and I probably have mentioned them before, here is a
plug for Ellexus https://www.ellexus.com/
Worth mentioning I have no connection with them!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180611/6256933f/attachment.html>


More information about the Beowulf mailing list