[Beowulf] Young novice with a tight pocket book

William Dieter william.dieter at gmail.com
Fri Aug 12 20:36:36 PDT 2005

On 8/12/05, stuffnstuff <stuffnstuff7 at gmail.com> wrote:
> I use my computer for massive rendering jobs regularily and wish there were
> ways of giving it more speed. I am strongly considering putting togethor a
> cluster, but I don't know the basic requirements. I have read several guides
> on building a cluster, but none of them seem to give the software and
> connection requirements. 

There are a lot of variables, so it is hard to say in general.  You
will find that the most common answer given here is "it depends," 
though the important part is on what it depends.  Some questions that
come to mind are what software are you using and how does it work? 
has it already been optimized?  If it is a well known package you may
be able to find a cookbook solution, or another person who has tried
speeding it up, too.  If you have developed the rendering software or
have access to source code there may be opportunities to squeeze more
uniprocessor performance out of it.  Even recompiling with
optimization options tuned to your processor can help performance
anywhere from 0% to 50% or more (and is really easy to try.)

If it is a package developed by someone else, does it have a parallel
version already?  If you not you may have to write some interface code
yourself, but if you can run multiple instances of the program at a
time (perhaps on different frames) then parallelizing it should be
pretty easy.  In case of invoking multiple instances of the program,
parallelizing the program could be done simply by writing shell
scripts to call rsh or ssh to invoke jobs remotely, or by using bproc
or OpenMOSIX to do the job distribution for you.

If a parallel version is not available and parallelization is more
complicated than running multiple instances of the program, you want
to parallelize the code using an API like MPI to re-write it, though
that will require good knowledge of what the package is doing, and
potentially a fair amount of effort.

> Until I am corrected, I will be under the impression that I can run a
> cluster with a master node using XP Home and sub-nodes using Windows 98. Do
> I need to buy a new operating system? 

I suppose so, though I have to admit I know next to nothing about
configuring Windows for parallel computing.  To be legal, you would
have to have licenses for all the nodes, though that may not be an
issue if they already come with licenses.

OTOH, if your software will run under Linux you can install one of the
many free Linux distributions.  In general Linux has the advantage
that you can strip out the parts you don't need, freeing more
resources to use for your computation.  It may also be easier to find
people on this list to help you since most people hear run Linux of
one flavor or another.

> What do I need to physically connect
> the computers. If I can conjure up a few old computers that would barely be
> worth selling, is it worth building a cluster? How do I physically do all
> this? 

Again it depends on what your software package is doing and how you
parallelize it.  If you can parallelize it by dividing the work into
chunks that are independent and the individual jobs rarely or never
need to communicate, then you may get benefit from scrounged machines
in proportion to their speeds using pretty much any networking
hardware you can find (100 Mbit ethernet is really cheap, even if you
have to buy it.)  For example, if PC A renders 3 times as many frames
per minute as PC B, you could expect to get 4 times as many frames per
minute using both PC A and PC B compared with just PC B.  You will
notice I did not say if PC A has 3x the clock speed of PC B.  Clock
speed is not necessarily an indicator of performance, especially if
the chips are of different architectures.

Also, in the above example, 4x faster is really the upper limit on
performance.  If the two PC's (nodes) have to communicate, they will
get less than 4x the speed of PC B alone due to the time spent
communicating.  If there is enough communication, the job can go even
slower than if it were just running on PC B alone!  Generally if you
have lots of communication, you need a faster network, and as you
scale to more nodes, you want them to be the same speed so to maintain
balance more easily.

Economics are a factor, too; depending on how much money you have, how
much speed up you want, how you value your time, and how fast your
current PC is.  If it a brand new PC would be 10x faster than your
current PC, and you place a high dollar value on your time, it is
probably  more economical to buy a faster PC and be done with it. 
OTOH, if you want to spend time learning about parallel computing,
setting up PC's, Linux, etc and have a very limited budget then a
scrounged cluster makes a lot of sense.  What you learn can translate
to other areas, so investing the time is probably worthwhile if you
have the interest.  Or if your PC is pretty much state of the art, and
you want big speedup, you probably have to parallelize to get the
performance you want, though scrounged parts will probably not be fast
enough.  However, experience with the scrounged parts can give you the
confidence that if you spend the money for newer parts you will get
the speedup you want.

Bill Dieter.

More information about the Beowulf mailing list