What could be the performance of my cluster

Fri Apr 12 12:25:06 PDT 2002

Greg Lindahl wrote:
> 

Hi Greg,

> On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig Tierney wrote:
> 
> > Is the BLAST code something that spends lots
> > of time trying doing lots of little calculations,
> > or doing one big calculation?  How important is
> > the speed of access to the database?  What is
> > the memory footprint of the code when it runs
> > on the DS20E?
> 
> It depends.
> 
> What BLAST does is compare a set of sequences against a big database of
> sequences. The databases come in small, medium, and large (bigger than
> 2 GByte) sizes; the sequences can either be a single sequence (imagine
> a researcher looking up a single protein using a web interface) or a
> large set of them. If it's a large set, the problem is embarrassingly
> parallel.
> 
> The BLAST implementation used by most people isn't parallel. It can be
> fairly easily parallelized to divide the big database up into pieces.
> 
> People build fairly different clusters to run BLAST depending on their
> details. The guys at Celera Geonmics didn't want to use a parallel
> version, and their database is bigger than 2 GBytes, so they bought
> Alphas. Most people have small enough databases to fit into 2 GBytes,
> but search against 1 sequence at a time, so they can't afford to read
> the entire database over NFS every time, and keep it on a local disk.

Do you have some sample proteins and databases ?

I would like to test some machines i have availble to mess around a
little bit.
(HP PA-Risc Series, SUN Sparc Fire, Itanium, Power PC).

I would like to build a little benchmark around these datasets.

regards
 Robert Depenbrock

-- 
nic-hdl RD-RIPE
http://www.bay13.de/
e-mail: robert at bay13.de
Fingerprint: 1CEF 67DC 52D7 252A 3BCD  9BC4 2C0E AC87 6830 F5DD