[Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
Joe Landman
landman at scalableinformatics.com
Mon Jan 16 17:03:14 PST 2006
Hi Craig:
Craig Tierney wrote:
> Mike Davis wrote:
>
>> But BLAST is only a small part and argueably the easiest part of
>> genomics work. The advantages of parallelization and/or smp come into
>> play when attempting to assemble the genome. Phred/Phrap can do the
>> work but starts to slow even large machines when your talking 50k+ of
>> sequences (which it wants to be in one folder). A quiz for the Unix
>> geeks out there, what happens when a folder has 50,000 files in it.
>> Can you say SLOOOOOOOOOWWWW?
>>
> First, pick the right filesystem.
> Second, rewrite your code so you don't have 50k+ files in one directory.
> There must be some straightforward way to solve the problem if
> you have too many files in one directory.
Lots of the informatics codes were not written with such input (or
database) scaling in mind. For them, 10-100 files in a directory isn't
much of a problem. Its when you start to scale up that the bugs and
surprises start.
Joe
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list