1 GFLOP / Parallel Input-Output Systems / AI

Wed Sep 13 05:35:47 PDT 2000

On Tue, 12 Sep 2000, Sahil Rewari wrote:

> Hi,

> Just wanted to know that in order to achieve 1 GFLOP what would the
hardware requirements for a Linux cluster be?

> I am about to begin on making my first cluster for research in mainly
two fields which are parallel input / output systems and the further
development of Artificial Intelligence (AI) systems.

> I stay in Mumbai (INDIA) and am finding it difficult to get likeminded
people to work on these projects. Linux in general is used by very few
people here. The inadequate availability of service and support
professionals and training for Linux probably is the cause to this. My
entire research work is done with the availability of resources on the
Internet and through other people working on such projects. Are there
any other people working on similar projects? Please do write to me at
nesol at bol.net.in

As Greg already said, this depends strongly on your application and the
design of your cluster.  One answer might be as few as two machines, for
example (and you might be able to get there in one) if the bulk of your
application (the core loop) is tiny and can run entirely in L1 cache on
a (say) 800 or 900 MHz PIII CPU.  On the other hand, if it is big and
highly nonlocal in memory, it might be as many as 25 e.g. 500 MHz
Celeron CPUs, and this is assuming that the application itself is
embarrassingly parallel so the parallel design of the cluster is mostly
irrelevant.

If the application is coarse to medium grained and has some significant
fraction of interprocessor communications (IPC's) during its execution,
then it is probably that you cannot just add up the FLOPS of N cpus to
get N*FLOPS performance.  Indeed, you may well not be able to get to a
GFLOP at all with any number of CPUs with certain beowulf designs.

It sounds like you need to do a bit of reading on parallel program
design and speedup scaling and then run some benchmarks on candidate
systems to come up with a quantitative answer, if this really matters to
you.  On the other hand, if you are just interested in generating a big
number (or a justifiable number:-) for a grant proposal, then by all
means, just add up the MFLOPS for N nodes.  This number will represent a
kind of peak (for embarrassingly parallel or very coarse grained
applications) but is the "best" number your cluster might achieve.

To find some help in learning about parallel program design and scaling
and all that, look for links under:

 http://www.phy.duke.edu/brahma

where there are a number of papers and presentations that discuss
speedup scaling and application profiling.  To get a very nice set of
microbenchmarking tools to run on a prospective node, check out lmbench
at

 http://www.bitmover.com

lmbench is used by (among many others) Linus Torvalds and the kernel
development folks to tune and optimize the kernel; it is a well-designed
and reliable package that I hope to talk about some at the ALSC next
month.

HTH,

   rgb

> Any help / suggestions regarding the above will be highly appreciated.

> Thanks in Advance,
> 
> Regards,
> Sahil Rewari
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu