[Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)
Robert G. Brown
rgb at phy.duke.edu
Tue Jan 17 07:54:41 PST 2006
On Tue, 17 Jan 2006, Eugen Leitl wrote:
> On Mon, Jan 16, 2006 at 06:43:42PM -0500, Mike Davis wrote:
>
>> sequences (which it wants to be in one folder). A quiz for the Unix
>> geeks out there, what happens when a folder has 50,000 files in it. Can
>> you say SLOOOOOOOOOWWWW?
>
> Unix doesn't have folders. Are you a Mac person, perchance?
>
> You also seem to be using the wrong file system.
>
> If your application is needing 50 k files in one
> directory, your application should not be needng 50 k files
> in one directory. One trivial fix is to organize it
> into subdirectories, using parts of file name or hashes
> as prefix.
Yeah, exactly. Or hash in any of many other ways -- ideally on the
actual FUNCTIONAL differentiators of the "DB lookup" if you are likely
to have to process multiple files in one pass, all linked by some common
elements. There is no substitute for actually using computer science
and intelligent code design for creating optimally scalable code, and no
end of ways of doing something linearly and badly in code written by
people who don't really know how to code OR how to look up efficient
solutions that are written by people who do know how to code.
DB lookup and efficient storage has long since moved into the
trancendental psychic regime with e.g. google and other engines capable
of managing enormous databases. Google can find a single file out of a
zillion or so (note well my usage of precise numbers;-) slightly before
you enter the search string, a thing that they manage by spinning their
servers so that the lookup engine travels slightly faster than the speed
of light.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list