[Beowulf] Re: scaling

Joe Landman landman at scalableinformatics.com
Mon Jan 16 19:00:08 PST 2006

Jim Lux wrote:

> I think that unforeseen scaling problems are probably the single biggest 
> challenge with HPC.

Yes.  Absolutely.  This is why in the Bioinformatics benchmark system, I 
have purposely scaled the benchmark sizes from tiny (which shouldn't be 
used for much more than proof that it runs correctly) to quite large.  I 
need to update the gargantuan mpiblast test.  So far I don't know if 
anyone was able to run that test successfully.

It is very hard to shake bugs out of a system when you don't pound on it 
in the same way that your program will.  We have in the past seen people 
post linpack/hpcc results and "passing" test reports for clusters that 
were unable to perform their primary function due to bugs/errors in the 
build of the system.

Nothing catches problems like real users really using the system.

>  If it's not the plethora of files, or crippling 
> interprocessor comm needs, it's something like timing races and implied 
> barriers (in a brute force master doling out identical work packets to 
> all the slaces.. they all finish at the same time).

Yup.  Way way back (2000-ish) in ct-blast we had these neat corner cases 
where we would have a nice distribution of sequence sizes, apart from 
the few gargantuan sequences.  We used some neat techniques to try to 
deal with the load imbalance (various sorting/sampling methods for the 
input sequences), and that helped somewhat, but better techniques were 
needed for the monster sequences.

You could see the load imbalance in the queuing system.  The long 
running jobs would keep running, and running, and running ... You could 
to a degree predict which jobs would take a long time, it was a function 
of the input sequence length.  With a little bit of program 
intelligence, you could bubble these to the start of the queue.  Would 
make the delays to get initial results large, but the load balance would 
be better.


Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

More information about the Beowulf mailing list