[Beowulf] since we are talking about file systems ...
Jim Lux
James.P.Lux at jpl.nasa.gov
Sun Jan 22 12:17:36 PST 2006
At 10:23 AM 1/22/2006, Robert G. Brown wrote:
>On Sun, 22 Jan 2006, PS wrote:
>
>>Indexing is the key; observe how Google accesses millions of files in
>>split seconds; this could easily be achieved in a PC file system.
>
>I think that you mean the right thing, but you're saying it in a very
>confusing way.
>
>1) Google doesn't access millions of files in a split second, it AFAIK
>accesses relatively few files that are hashes (on its "index server")
>that lead to URLs in a split second WITHOUT actually traversing millions
>of alternatives (as you say, indexing is the key:-). File access
>latency on a physical disk makes the former all but impossible without
>highly specialized kernel hacks/hooks, ramdisks, caches, disk arrays,
>and so on. Even bandwidth would be a limitation if one assumes block
>I/O with a minimum block size of 4K -- 4K x 1M -> 4 Gigabytes/second
>(note BYTES, not bits) exceeds the bandwidth of pretty much any physical
>medium except maybe memory.
>
>2) It cannot "easily" be achieved in a PC file system, if by that you
>mean building an actual filesystem (at the kernel level) that supports
>this sort of access. There is a lot more to a scalable, robust,
>journalizeable filesystem than directory lookup capabilities. A lot of
>Google's speed comes from being able to use substantial parallelism on a
>distributed server environment with lots of data replication and
>redundancy, a thing that is impossible for a PC filesystem with a number
>of latency and bandwidth bottlenecks at different points in the dataflow
>pathways towards what is typically a single physical disk on a single
>e.g. PCI-whatever channel.
>
>I think that what you mean (correctly) is that this is something that
>"most" user/programmers would be better off trying to do in userspace on
>top of any general purpose, known reliable/robust/efficient PC
>filesystem, using hashes customized to the application. When I first
>read your reply, though, I read it very differently as saying that it
>would be easy to build a linux filesystem that actually permits millions
>of files per second to be accessed and that this is what Google does
>operationally.
This is almost certainly true. Typically, the user knows a bit about their
application, and can come up with a "good" way to hash or structure the
directories/filenames that will have decent performance with the underlying
OS filesystem. It's also easy to test with some programs that will
generate the zillions of files needed.
However, if you are writing software for eventual distribution to others,
make sure you explain how you do it, and be aware that other file systems
may not see it the same way. Anecdote to illustrate: Back in the late 80s,
early 90s, I built a software system which provided a database of around
10,000 industrial real estate properties. The amount of information for
each property was highly variable (if, for no other reason than we stored
all the historical change information), so I stored all the data for each
property in it's own (MS-DOS) file. All those files were stored
(originally) in a directory called something like "database". Early
testing worked real well, with a data base of a few hundred records (i.e.
files), but when we loaded several thousand up, it slowed to a crawl. A bit
of experimentation enabled me to figure out where the "breakpoints" in the
MS-DOS internal directory caches were, so we could come up with an
appropriate set of directories: do you do 100 directories of 100 files, or
10 directories of 1000 files, or 10 directories of 10 directories of 100
files, etc. As it happens we wound up with 100 directories, and hashed
based on the low order digits of the property's id number (the numbers
being a legacy of the original manual system, and guaranteed unique: most
of the time<grin>).
All was well, even through many revs.. People would ask about it when they
did backups (what are all those directories for?).
Enter Novell Networking... Apparently, Netware had it's own scheme for
cacheing file directory information on their servers with very different
properties from that in MS-DOS, AND, the default installations never
contemplated the possibility that someone might have, gasp, 10,000 files
that they needed regular access to. Much wailing, gnashing of teeth, and
Novell CNEs who needed to go and talk to Novell customer support about how
to reconfigure the server (which essentially required reformatting and
rebuilding the server, a long and tedious process involves many 5 1/4"
floppies to backup the existing contents, etc.)
--- so, if you DO implement one of these hashing schemes, document it well.
BTW, you're still better off figuring out how to work WITH the OS existing
architecture. I built a replacement file system for RSX-11M-PLUS back in
the late 70s that supported shared disks, and it was a royal pain to make
work. Things like cache concurrency, write behind, and such-like are
tricky to deal with.
More information about the Beowulf
mailing list