[Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Sat Nov 3 06:50:22 PDT 2012


  1.  Yes and no..  The application process needs to be "parallel aware", but for some applications that could just mean running multiple instances, one on each node, and farming the work out to them. This is called "embarassingly parallel" (EP).. A good example would be rendering animation frames.  Typically each frame doesn't depend on the frames around it so you can just parcel the work at a frame granularity to the nodes.    There are other applications which are more tightly coupled and where the computation process running on node N needs to know something about what's running on Node N+1 and Node N-1 very frequently.   For this, applications use some sort of standardized process communication library (e.g. MPI), or, perhaps a library that performs a high level function (e.g. Matrix inversion) that underneath uses the interprocess comm.

2.  Another "it depends". If the process is EP, and each node is processing a different image, then your problem is one of sending and retrieving images, which isn't much different from a conventional file server kind of model.  If multiple processors/nodes are working on the same image, then the interconnect might be more important.  It all depends on the communication requirements.     Note that even EP applications can get themselves fouled up in network traffic (imagine booting 1000 nodes simultaneously, with them all wanting to fetch the boot image from one server simultaneously)


This is the place to ask..


From: CJ O'Reilly <supaiku at gmail.com<mailto:supaiku at gmail.com>>
Date: Wednesday, October 31, 2012 11:31 PM
To: "beowulf at beowulf.org<mailto:beowulf at beowulf.org>" <beowulf at beowulf.org<mailto:beowulf at beowulf.org>>
Subject: [Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics

Hello, I hope that this is a suitable place to ask this, if not, I would equally appreciate some advice on where to look in lue of answers to my questions:
You may guess that I'm very new to this subject.

I am currently researching the feasibility and process of establishing a relatively small HPC cluster to speed up the processing of large amounts of digital images.

After looking at a few HPC computing software solutions listed on the Wikipedia comparison of cluster software page ( http://en.wikipedia.org/wiki/Comparison_of_cluster_software ) I still have only a rough understanding of how the whole system works.

I have a few questions:
1. Do programs you wish to use via HPC platforms need to be written to support HPC, and further, to support specific middleware using parallel programming or something like that?
OR
Can you run any program on top of the HPC cluster and have it's workload effectively distributed? --> How can this be done?
2. For something like digital image processing, where a huge amount of relatively large images (14MB each) are being processed, will network speed, or processing power be more of a limiting factor? Or would a gigabit network suffice?
3. For a relatively easy HPC platform what would you recommend?

Again, I hope this is an ok place to ask such a question, if not please help refer me to a more suitable source.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20121103/d8fcd06c/attachment.html>


More information about the Beowulf mailing list