[Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics
supaiku at gmail.com
Sat Nov 3 17:00:22 PDT 2012
! type fast indeed:p
Thank you for the detailed explanation.
I'll have to look more into the processing we're doing and it's
requirements before proceeding:)
Your information has be extremely helpful:)
On Sun, Nov 4, 2012 at 7:42 AM, Mark Hahn <hahn at mcmaster.ca> wrote:
> Thanks, infoative: p
>> I'll consider your advice.
>> If i read correctly, it seems the answer to the question about programming
>> was: yes, a program must be written to accommodate a cluster. Did i get
> it depends what you mean. if you have a program which is written
> so that it can be run from a script, then a cluster can immediately
> let you run lots of them. if you're expecting a cluster to speed
> up a single instance, then you'll probably be disappointed.
> in short, clustering doesn't speed up any of the computers in the cluster.
> it just makes it more convenient to get multiple computers working.
> if you want multiple computers to work on the same program, then someone
> has to make it happen: divide up the work so each computer
> and put together the results.
> suppose you're trying to detect a particular face in all your images.
> you could have once machine searching an image, then going onto the next.
> basically, that one node is running a simple scheduler that runs jobs:
> lookforface face.png image0.png
> lookforface face.png image1.png
> lookforface face.png image2.png
> if you want, you can divide up the work - send every other image to a
> second machine. in general, this would mean that a scheduler reads from
> that same list and dispatches one line (job) at a time to any
> node that isn't already busy. when a job completes, that node gets
> another job, and eventually all the work is done.
> "embarassingly parallel" just means you have enough images to keep all your
> machines busy this way.
> if you don't have that many images, you might want to try to get more than
> one machine working on the same image. a simple way to do that would be
> to (imaginarily) divide each image into, say, quadrants, so 4 machines can
> work on the same image (each getting a quarter of the image - with some
> overlap so targets along the border don't get missed.) to be specific,
> your list of jobs could be like this:
> lookforface face.png image0.png 0
> lookforface face.png image0.png 1
> lookforface face.png image0.png 2
> lookforface face.png image0.png 3
> lookforface face.png image1.png 0
> lookforface face.png image1.png 1
> where 'lookforface' only looks for the face in the specified quadrant of
> the input image. the most obvious problem with this approach is that
> 1-quadrant search may take too little time relative to the overhead of
> setting up each job. which includes accessing face.png and image0.png,
> even if only a quadrant of the latter is used. in general, this kind of
> issue is called "load balance", and is really the single most fundamental
> issue in HPC.
> if you wanted to pursue this direction, you could optimize by reducing
> the cost of distributing the images. if image0.png is quite large,
> then access through a shared filesystem might be efficient (if the FS
> block size is comparable to 1/2 the width of one image row.) if image0.png
> is smaller, then you could distribute that information "manually" by
> a job which reads the image one one node and distributes quadrants to
> other nodes. the obvious way to do this would be via MPI, which is pretty
> friendly to matrices like decompressed images. this could even
> operate on pieces smaller than a quadrant - in fact, you could divide the
> work however finely you like. though as before, divide it too fine, and the
> per-chunk overhead dominates your cost, destroying efficiency.
> note that this refinement has merely changed who/how the work is being
> divided and data being communicated. in the simple case, work was divided
> at the command/job/scheduler level and data transmitted by file. the more
> fine-grained approach has subsumed some scheduling into your program, and
> is communicating the data explicitly over MPI.
> basically, someone has to divide up work, and data has to flow to where
> used. you could take this further: a single MPI program that runs on all
> nodes of the cluster at once and distributes work among MPI ranks. this
> would be the most programming effort, but would quite possibly be the most
> efficient. often, the amount of time needed to perform one unit of work
> is not constant - this can cause problems if your division of labor is too
> rigid. (consider the MPI-searches-4-quadrants approach: if one quadrant
> takes very little time, then the CPU associated with that quadrant will be
> twiddling its thumbs while the other quadrants get done.)
> I have, of course, completely fabricated this whole workflow. it becomes
> more interesting when the work has other dimensions - for instance, if you
> are searching 1M images for any of 1k faces. or if you are really hot to
> use a convolution approach so will be fourier-transforming all the images
> before performing any matching. or if you want to use GPUs, etc.
> TL;DR it's a good thing I type fast ;)
> in any case, your first step should be to look at the time taken to get
> inputs to a node, and then how long it takes to do the computation.
> life is easy if setup is fast and compute is long. that stuff is far more
> important than choosing a particular scheduler or cluster package.
> regards, mark hahn.
> ? 2012-11-4 ??6:11?"Mark Hahn" <hahn at mcmaster.ca>???
>> I am currently researching the feasibility and process of establishing a
>>>> relatively small HPC cluster to speed up the processing of large amounts
>>>> digital images.
>>> do you mean that smallness is a goal? or that you don't have a large
>>> After looking at a few HPC computing software solutions listed on the
>>>> Wikipedia comparison of cluster software page (
>>>> I still have
>>>> only a rough understanding of how the whole system works.
>>> there are several discrete functionalities:
>>> - shared filesystem (if any)
>>> - scheduling
>>> - intra-job communication (if any; eg MPI)
>>> - management/provisioning/****monitoring of nodes
>>> IMO, anyone who claims to have "best practices" in this field is lying.
>>> there are particular components that have certain strengths, but none of
>>> them are great, and none universally appropriate. (it's also common
>>> to conflate or "integrate" the second and fourth items - for that matter,
>>> monitoring is often separated from provisioning.)
>>> 1. Do programs you wish to use via HPC platforms need to be written to
>>>> support HPC, and further, to support specific middleware using parallel
>>>> programming or something like that?
>>> "middleware" is generally a term from the enterprise computing
>>> it basically means "get someone else to take responsibility for hard
>>> and is a form of the classic commercial best practice of CYA. from an
>>> perspective, there's the application and everything else. if you really
>>> want, you can call the latter "middleware", but doing so is
>>> HPC covers a lot of ground. usually, people mean jobs will execute in a
>>> batch environment (started from a commandline/script). OTOH HPC
>>> means what you might call "personal supercomputing", where an interactive
>>> application runs in a usually-dedicated cluster (shared clusters tend to
>>> have scheduling response times that make interactive use problematic.)
>>> (shared clusters also give rise to the single most important value of
>>> clusters: that they can interleave bursty demand. if everyone in your
>>> department shares a cluster, it can be larger than any one group can
>>> afford, and therefore all groups will be able to burst to higher
>>> this is why large, shared clusters are so successful. and, for that
>>> why cloud services are successful.)
>>> you can do HPC with very little overhead. you will generally want a
>>> filesystem - potentially just a NAS box or existing server. you may not
>>> bother with scheduling at all - let users pick which machine to run on,
>>> for instance. that sounds crazy, but if you're the only one using it,
>>> bother with a scheduler? HPC can also be done without inter-job
>>> communication - if your jobs are single-node serial or threaded, for
>>> instance. and you may not need any sort of management/provisioning,
>>> depending on the stability of your nodes, environment, expected lifetime,
>>> in short, slapping linux onto a few boxes, set up ssh keys or hostbased
>>> trust, have one or more of them NFS out some space, and you're cooking.
>>>> Can you run any program on top of the HPC cluster and have it's workload
>>>> effectively distributed? --> How can this be done?
>>> this is a common newbie question. a naive program (probably serial or
>>> multithreaded) will see no benefit from a cluster. clusters are just
>>> old machines. the benefit comes if you want throughput (jobs per time)
>>> specifically program for distributed computation (classically with MPI).
>>> it's common to use infiniband to accelerate this kind of job (as well as
>>> provide the fastest possible IO.)
>>> 2. For something like digital image processing, where a huge amount of
>>>> relatively large images (14MB each) are being processed, will network
>>> the main question is how much work a node will be doing per image.
>>> suppose you had an infinitely fast fileserver and gigabit connected
>>> transferring the image would take 10-15ms, so you would ideally spend
>>> about the same amount of time processing an image. but in this case, you
>>> should probably ask whether you can simply store images on the nodes in
>>> first place. if you haven't thought about where the inputs are and how
>>> fast they
>>> can be gotten, then that will probably be your bottleneck.
>>> speed, or processing power be more of a limiting factor? Or would a
>>>> network suffice?
>>> how long does a prospective node take to complete one work unit,
>>> and how long does it take to transfer the files for one?
>>> your speedup will be limited by whatever resource saturates first
>>> (possibly your fileserver.)
>>> 3. For a relatively easy HPC platform what would you recommend?
>>> they are all crap. you should try not to spend on crap you don't need,
>>> but ultimately it depends on how much expertise you have and/or how much
>>> you value your time. any idiot can build a cluster from scratch using
>>> fundamental open-source components, eventually. but if said idiot has to
>>> learn filesystems, scheduling, provisioning, etc from scratch, it could
>>> take quite a while. when you buy, you are buying crap, but it's crap
>>> that may save you some time.
>>> don't count on commercial support being more than crappy.
>>> you should probably consider using a cloud service - this is just
>>> outsourcing - more crap, but perhaps of value if, for instance, you don't
>>> want to get your hands dirty hosting machines (amazon), etc.
>>> anything commercial in this space tends to be expensive. the license to
>>> cover a crappy scheduler for a few hundred nodes, for instance will be
>>> close to an FTE-year. renting a node from a cloud provider for a year
>>> about as much as buying a new node each year, etc.
>>> Again, I hope this is an ok place to ask such a question, if not please
>>> this is the place. though there are some fringe sects of HPC who tend to
>>> subsist on more and/or different crap (such as clusters running windows.)
>>> beowulf tends towards the low-crap end of things (linux, open packages.)
>>> regards, mark hahn.
> operator may differ from spokesperson. hahn at mcmaster.ca
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf