[Beowulf] transcode Similar Video Processing on Beowulf?

Wed Apr 16 07:16:36 PDT 2014

On 15/04/14 14:02, Gavin W. Burris wrote:
>>> 
> Yes!  No doubt!  The "simple" queues presuppose a massive
> distributed system to take advantage of.  Bonus points if that
> system can interchangeably be an in-house cluster or major cloud
> provider.
> 
> I would be very interested to hear what your preferred tools and
> APIs are for your analysis system.  I can easily default to the job
> script and qsub workflow, but restful cloud APIs and simple queues
> seem to be next-level for some workflows.

That's what we've got - we run some stuff in-house for security
reasons and some stuff on Amazon EC2, though Rackspace Cloud or MS
Azure even could be used (they're just not as cheap). Managing image
generation and machine provisioning is half the battle, but there's
lots of open source tools that help there (Packer, Puppet, etc).

The whole system is written in Ruby in terms of the orchestration code
(the algorithms are usually C/Python); the coordination system is
actually Amazon's Simple Workflow Service, which is something of a
cop-out but we're not keen to reinvent the wheel. It provides
statekeeping and job queues in one package; replacing it wouldn't be
trivial but wouldn't be a massive task; the cost of using it is tiny,
though, and it made our life a lot easier. It's all written in terms
of deciders, which make decisions based on a list of events associated
with an event (eg a "finished activity" event will have the details
about the activity starting, being scheduled, and being completed,
output status etc), and workers, which perform activities. State is
maintained by passing JSON blobs around as messages; there'll be a
blog post or two explaining things on our website soonish and I'll
post them across if there's interest.

It's being used in production on a regular basis and has had quite a
lot of content processed through it so far; these tasks on average run
for 2-6 hours and involve ~1GB of data going in and a few megabytes
out. The APIs are all simple HTTPS RESTful ones, storage can be cloud
provider storage or local shared drive storage.

Not very 'traditional HPC' but it does the job - there's an
interesting intersection between HPC and these sorts of more abstract
run-anywhere sorts of systems, where the performance per job and
interprocess communication performance is less important and
robustness and dynamic scalability plays a major role.

-- 
Cheers,
James