Parallel BLAST - help

Ting ting at
Tue Apr 16 15:08:59 PDT 2002

Hello, All,

  I have three nodes Beowulf cluster MPI environment up and running now.
  And download the FASTA from NCBI on the master node.
  I successful wrote a code to break the data,
  but unfortunately I could not have the runable code to get the
  data back from the nodes to the host(master). :-(

  Can anyone give me some suggestion or web site that I
  can have the runable code to use?  It would help me a lot.

  Thank you very much.


-----Original Message-----
From: Steve Gaudet
Sent: Monday, April 15, 2002 11:12 AM
To: 'William R. Pearson'; beowulf at
Subject: RE: Parallel BLAST

> -----Original Message-----
> From: William R. Pearson
> Sent: Sunday, April 14, 2002 10:32 PM
> To: beowulf at
> Subject: Parallel BLAST
> > Why is it that BLAST is not available for MPI/PVM?  I would think
> > clusters would be the prefect host for such an application.
> > Is it there is no need because BLAST is already so fast and
> > no one wants to break the database out onto node-resident disks?
> > Or is it that BLAST is kept running on single processor or
> shared memory
> > machines BLAST so that the DB is always in memory ready to
> roll without
> > loading and doing the same for a cluster is not worth it
> > because the same trick is difficult to do on a node given
> the current
> > way clusters are built?  I assume the same is true for FASTA?
> I suspect that BLAST is not available for MPI/PVM because (1) it is
> too fast, and (2) there is not much demand for it.
> 95% of the time, BLAST is almost an in-memory grep (the other 5% of
> the time it is working on the things it is looking for).  Sequence
> comparison is embarrassingly parallel, and very easily threaded.
> Distributing the sequence databases and collecting results has more
> overhead (there probably aren't many distributed grep programs
> either).  FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is
> another 5-20X slower than FASTA.  Here, the communications overhead is
> low, and distributed systems work OK for FASTA, and great for
> Smith-Waterman (where the overhead fraction is very small).
> Of course, it is a lot easier to compile a threaded program, and just
> run it, than it is to install and configure the MPI or PVM environment
> and the programs to run in it.  Bioinformatics software is often run
> by computer savvy biologists, not high-performance computing folks,
> and not having to install and configure PVM/MPI is a big advantage.
> The NCBI probably does not make a PVM/MPI parallel BLAST because there
> is very little demand for it, and it does not meet their computational
> needs.

There's also a commerical version from Turbogenomics.


1) Ready to go, plug-n-play solution for parallel BLAST
2) Expertise and 20+ years of experience in parallel computing
3) Dynamic database splitting feature to take advantage of computers that
have less memory than the size of the database
4) Smart load balancing - achieve linear to superlinear speedup
5) No modification made to the NCBI BLAST algorithm to ensure identical
results with the non-parallel version
6) Easy drop-in update whenever NCBI releases newer versions of their
7) Excellent support
8) 30-days money back guarantee


Steve Gaudet
Linux Solutions Engineer

| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 8025 South Willow St.      fax:603-666-4519                     |
| Building 2, Unit 105       toll free:800-573-5393               |
| Manchester, NH 03103       e-mail:sgaudet at  |
|                            web: |

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list