[Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)

Mike Davis jmdavis at mail2.vcu.edu
Mon Jan 16 18:05:33 PST 2006


Exactly Joe, these codes were written (often in Perl) to solve smaller 
problems. They may have never envisioned 50k+ sequences to be assembled. 
I would also point out that much of bioinformatics is performed on flat 
text file databases.

Those of us who also work with physicists and chemists know that their 
software and algorithms are tweaked for very good if not excellent 
performance. Much of that work was done (in fortran) in the 70's and 
80's. But the field of bioinformatics is so new that no one has made 
those types of optimizations (as far as I know for many of the programs).

There's alot of room for improvement. We looked at the Paracel 
solutions, but just couldn't justify the cost. Instead, we make use of a 
combination of machines. Embarassingly parallel work like BLAST, runs on 
the same clusters that run G03, GAMESS and FEMAP. The assemblies run on 
large SMP Suns with 32-96GB of RAM, and large disk filesystems.

Mike Davis

Joe Landman wrote:

> Hi Craig:
>
> Craig Tierney wrote:
>
>> Mike Davis wrote:
>>
>>> But BLAST is only a small part and argueably the easiest part of 
>>> genomics work. The advantages of parallelization and/or smp come 
>>> into play when attempting to assemble the genome. Phred/Phrap can do 
>>> the work but starts to slow even large machines when your talking 
>>> 50k+ of sequences (which it wants to be in one folder). A quiz for  
>>> the Unix geeks out there, what happens when a folder has 50,000 
>>> files in it. Can you say SLOOOOOOOOOWWWW?
>>>
>> First, pick the right filesystem.
>> Second, rewrite your code so you don't have 50k+ files in one directory.
>> There must be some straightforward way to solve the problem if
>> you have too many files in one directory.
>
>
> Lots of the informatics codes were not written with such input (or 
> database) scaling in mind.  For them, 10-100 files in a directory 
> isn't much of a problem.  Its when you start to scale up that the bugs 
> and surprises start.
>
>
> Joe
>




More information about the Beowulf mailing list