[Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)

Joe Landman landman at scalableinformatics.com
Mon Jan 16 17:03:14 PST 2006


Hi Craig:

Craig Tierney wrote:
> Mike Davis wrote:
> 
>> But BLAST is only a small part and argueably the easiest part of 
>> genomics work. The advantages of parallelization and/or smp come into 
>> play when attempting to assemble the genome. Phred/Phrap can do the 
>> work but starts to slow even large machines when your talking 50k+ of 
>> sequences (which it wants to be in one folder). A quiz for  the Unix 
>> geeks out there, what happens when a folder has 50,000 files in it. 
>> Can you say SLOOOOOOOOOWWWW?
>>
> First, pick the right filesystem.
> Second, rewrite your code so you don't have 50k+ files in one directory.
> There must be some straightforward way to solve the problem if
> you have too many files in one directory.

Lots of the informatics codes were not written with such input (or 
database) scaling in mind.  For them, 10-100 files in a directory isn't 
much of a problem.  Its when you start to scale up that the bugs and 
surprises start.


Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615




More information about the Beowulf mailing list