[Beowulf] Torrents for HPC
Bill Broadley
bill at cse.ucdavis.edu
Fri Jun 8 17:06:19 PDT 2012
I've built Myrinet, SDR, DDR, and QDR clusters ( no FDR yet), but I
still have users whose use cases and budgets still only justify GigE.
I've setup a 160TB hadoop cluster is working well, but haven't found
justification for the complexity/cost related to lustre. I have high
hopes for Ceph, but it seems not quite ready yet. I'd happy to hear
otherwise.
A new user on one of my GigE clusters submits batches of 500 jobs that
need to randomly read a 30-60GB dataset. They aren't the only user of
said cluster so each job will be waiting in the queue with a mix of others.
As you might imagine that hammers a central GigE connected NFS server
pretty hard. This cluster has 38 computes/304 cores/608 threads.
I thought torrent might be a good way to publish such a dataset to the
compute nodes (thus avoiding the GigE bottleneck). So I wrote a
small/simple bittorrent client and made a 16GB example data set and
measured the performance pushing it to 38 compute nodes:
http://cse.ucdavis.edu/bill/btbench-2.png
The slow ramp up is partially because I'm launching torrent clients with
a crude for i in <compute_nodes> { ssh $i launch_torrent.sh }.
I get approximately 2.5GB/sec sustained when writing to 38 compute
nodes. So 38 nodes * 16GB = 608GB to distribute @ 2.5 GHz sec = 240
seconds or so.
The clients definitely see MUCH faster performance when access a local
copy instead of a small share of the performance/bandwidth of a central
file server.
Do you think it's worth bundling up for others to use?
This is how it works:
1) User runs publish <directory> <name> before they start submitting
jobs.
2) The publish command makes a torrent of that directory and starts
seeding that torrent.
3) The user submits an arbitrary number of jobs that needs that
directory. Inside the job they "$ subscribe <name>"
4) The subscribe command launches one torrent client per node (not per j
job) and blocks until the directory is completely downloaded
5) /scratch/<user>/<name> has the users data
Not nearly as convenient as having a fast parallel filesystem, but seems
potentially useful for those who have large read only datasets, GigE and
NFS.
Thoughts?
More information about the Beowulf
mailing list