[Beowulf] Torrents for HPC
Jesse Becker
beckerjes at mail.nih.gov
Mon Jun 11 11:02:43 PDT 2012
On Mon, Jun 11, 2012 at 01:49:23PM -0400, Joshua Baker-LePain wrote:
>On Fri, 8 Jun 2012 at 5:06pm, Bill Broadley wrote
>
>> Do you think it's worth bundling up for others to use?
>>
>> This is how it works:
>> 1) User runs publish <directory> <name> before they start submitting
>> jobs.
>> 2) The publish command makes a torrent of that directory and starts
>> seeding that torrent.
>> 3) The user submits an arbitrary number of jobs that needs that
>> directory. Inside the job they "$ subscribe <name>"
>> 4) The subscribe command launches one torrent client per node (not per j
>> job) and blocks until the directory is completely downloaded
>> 5) /scratch/<user>/<name> has the users data
>>
>> Not nearly as convenient as having a fast parallel filesystem, but seems
>> potentially useful for those who have large read only datasets, GigE and
>> NFS.
>>
>> Thoughts?
>
>I would definitely be interested in a tool like this. Our situation is
>about as you describe -- we don't have the budget or workload to justify
>any interconnect higher-end than GigE, but have folks who pound our
>central storage to get at DBs stored there.
I looked into doing something like this on 50-node cluster to
synchronize several hundred GB of semi-static data used in /scratch.
I found that the time to build the torrent files--calculating checksums
and such--was *far* more time consuming than the actual file
distribution. This is on top of the rather severe IO hit on the "seed"
box as well.
I fought with it for a while, but came to the conclusion that *for
_this_ data*, and how quickly it changed, torrents weren't the way to
go--largely because of the cost of creating the torrent in the first
place.
However, I do think that similar systems could be very useful, if
perhaps a bit less strict in their tests. The peer-to-peer model is
uselful, and (in some cases) simple size/date check could be enough to
determine when (re)copying a file.
One thing torrent's don't handle are file deletions, which opens up a
few new problems.
Eventually, I moved to a distrbuted rsync tree, which worked for a
while, but was slightly fragile. Eventually, we dropped the whole
thing when we purchased a sufficiently fast storage system.
--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
More information about the Beowulf
mailing list