[Beowulf] Torrents for HPC
Bill Broadley
bill at cse.ucdavis.edu
Tue Jun 12 15:59:46 PDT 2012
On 06/12/2012 03:47 PM, Skylar Thompson wrote:
> We manage this by having users run this in the same Grid Engine
> parallel environment they run their job in. This means they're
> guaranteed to run the sync job on the same nodes their actual job runs
> on. The copied files change so slowly that even on 1GbE network is
> rarely a bottleneck, since we only transfer files that are changed.
Our problem is we have many users and don't want 50,000 30 minute jobs
to turn into a giant jobs that defeats the priority system while
running. With an array job users can get 100% of the cluster if it's
idle and quickly decay to their fair share when other higher priority
jobs run.
That way we can have the cluster 100% utilized, but new jobs (from users
using less than their fair share) can get through the queue (which might
well be months long) quickly.
More information about the Beowulf
mailing list