[Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mike Davis jmdavis at mail2.vcu.eduMon Jan 16 18:05:33 PST 2006
- Previous message: [Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
- Next message: [Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Exactly Joe, these codes were written (often in Perl) to solve smaller problems. They may have never envisioned 50k+ sequences to be assembled. I would also point out that much of bioinformatics is performed on flat text file databases. Those of us who also work with physicists and chemists know that their software and algorithms are tweaked for very good if not excellent performance. Much of that work was done (in fortran) in the 70's and 80's. But the field of bioinformatics is so new that no one has made those types of optimizations (as far as I know for many of the programs). There's alot of room for improvement. We looked at the Paracel solutions, but just couldn't justify the cost. Instead, we make use of a combination of machines. Embarassingly parallel work like BLAST, runs on the same clusters that run G03, GAMESS and FEMAP. The assemblies run on large SMP Suns with 32-96GB of RAM, and large disk filesystems. Mike Davis Joe Landman wrote: > Hi Craig: > > Craig Tierney wrote: > >> Mike Davis wrote: >> >>> But BLAST is only a small part and argueably the easiest part of >>> genomics work. The advantages of parallelization and/or smp come >>> into play when attempting to assemble the genome. Phred/Phrap can do >>> the work but starts to slow even large machines when your talking >>> 50k+ of sequences (which it wants to be in one folder). A quiz for >>> the Unix geeks out there, what happens when a folder has 50,000 >>> files in it. Can you say SLOOOOOOOOOWWWW? >>> >> First, pick the right filesystem. >> Second, rewrite your code so you don't have 50k+ files in one directory. >> There must be some straightforward way to solve the problem if >> you have too many files in one directory. > > > Lots of the informatics codes were not written with such input (or > database) scaling in mind. For them, 10-100 files in a directory > isn't much of a problem. Its when you start to scale up that the bugs > and surprises start. > > > Joe >
- Previous message: [Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
- Next message: [Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
