BLAST and FASTA benchmarks

William R. Pearson wrp at alpha0.bioch.virginia.edu
Sat Apr 13 13:11:25 PDT 2002


There was a bit of misinformation about the difference between the
BLAST and FASTA programs for protein and DNA sequence comparison
program.

Both BLAST and FASTA search for local sequence similarity - indeed
they have exactly the same goals, though they use somewhat different
algorithms and statistical approaches.

The advantage of an ES40 or other large shared memory machine for
BLAST is that it has been optimized for searching databases that are
large memory mapped files, and it runs multithreaded.  PVM and MPI
versions of BLAST are not available, but, it is important to remember
that BLAST is extremely fast, and highly optimized to go through a
large amount of memory very quickly; it would be difficult to provide
an equally efficient distributed version - but, of course, a
distributed memory machine would be much cheaper.

PVM and MPI versions of FASTA are available.  FASTA actually is a
package of about a dozen programs that vary more than 100-fold in
speed.  It is easy to make efficient PVM/MPI versions of the slower
algorithms (Smith-Waterman, TFASTY, TFASTX); parallel versions of the
FASTA algorithm are less efficient.

How to benchmark BLAST and FASTA -

As Greg Lindahl pointed out, the appropriate platform for BLAST (less
so for FASTA) depends on the size of the database.  Very few databases
are larger than 2 Gb (I think the person who said he had an 80 Gb
database was mistaken - the largest publically available sequence
database, Genbank, currently has 17Gb of sequence data).  In contrast,
protein sequence databases are much smaller, typically 50 - 500 Mb).

If you would like to try searching some protein or DNA sequence
databases, they are available from ftp.ncbi.nih.gov/blast/db.  nr.Z
and swissprot.Z are two representative protein sequence databases,
nt.Z and est_mouse.Z are representative DNA databases.  Simply select
10 - 100 sequences at random from these databases and run them against
the full size databases.

Bill Pearson



More information about the Beowulf mailing list