What could be the performance of my cluster
Robert Depenbrock
robert at bay13.de
Fri Apr 12 12:25:06 PDT 2002
Greg Lindahl wrote:
>
Hi Greg,
> On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig Tierney wrote:
>
> > Is the BLAST code something that spends lots
> > of time trying doing lots of little calculations,
> > or doing one big calculation? How important is
> > the speed of access to the database? What is
> > the memory footprint of the code when it runs
> > on the DS20E?
>
> It depends.
>
> What BLAST does is compare a set of sequences against a big database of
> sequences. The databases come in small, medium, and large (bigger than
> 2 GByte) sizes; the sequences can either be a single sequence (imagine
> a researcher looking up a single protein using a web interface) or a
> large set of them. If it's a large set, the problem is embarrassingly
> parallel.
>
> The BLAST implementation used by most people isn't parallel. It can be
> fairly easily parallelized to divide the big database up into pieces.
>
> People build fairly different clusters to run BLAST depending on their
> details. The guys at Celera Geonmics didn't want to use a parallel
> version, and their database is bigger than 2 GBytes, so they bought
> Alphas. Most people have small enough databases to fit into 2 GBytes,
> but search against 1 sequence at a time, so they can't afford to read
> the entire database over NFS every time, and keep it on a local disk.
Do you have some sample proteins and databases ?
I would like to test some machines i have availble to mess around a
little bit.
(HP PA-Risc Series, SUN Sparc Fire, Itanium, Power PC).
I would like to build a little benchmark around these datasets.
regards
Robert Depenbrock
--
nic-hdl RD-RIPE
http://www.bay13.de/
e-mail: robert at bay13.de
Fingerprint: 1CEF 67DC 52D7 252A 3BCD 9BC4 2C0E AC87 6830 F5DD
More information about the Beowulf
mailing list