[Beowulf] since we are talking about file systems ...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jim Lux James.P.Lux at jpl.nasa.govSun Jan 22 12:17:36 PST 2006
- Previous message: [Beowulf] since we are talking about file systems ...
- Next message: [Beowulf] since we are talking about file systems ...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 10:23 AM 1/22/2006, Robert G. Brown wrote: >On Sun, 22 Jan 2006, PS wrote: > >>Indexing is the key; observe how Google accesses millions of files in >>split seconds; this could easily be achieved in a PC file system. > >I think that you mean the right thing, but you're saying it in a very >confusing way. > >1) Google doesn't access millions of files in a split second, it AFAIK >accesses relatively few files that are hashes (on its "index server") >that lead to URLs in a split second WITHOUT actually traversing millions >of alternatives (as you say, indexing is the key:-). File access >latency on a physical disk makes the former all but impossible without >highly specialized kernel hacks/hooks, ramdisks, caches, disk arrays, >and so on. Even bandwidth would be a limitation if one assumes block >I/O with a minimum block size of 4K -- 4K x 1M -> 4 Gigabytes/second >(note BYTES, not bits) exceeds the bandwidth of pretty much any physical >medium except maybe memory. > >2) It cannot "easily" be achieved in a PC file system, if by that you >mean building an actual filesystem (at the kernel level) that supports >this sort of access. There is a lot more to a scalable, robust, >journalizeable filesystem than directory lookup capabilities. A lot of >Google's speed comes from being able to use substantial parallelism on a >distributed server environment with lots of data replication and >redundancy, a thing that is impossible for a PC filesystem with a number >of latency and bandwidth bottlenecks at different points in the dataflow >pathways towards what is typically a single physical disk on a single >e.g. PCI-whatever channel. > >I think that what you mean (correctly) is that this is something that >"most" user/programmers would be better off trying to do in userspace on >top of any general purpose, known reliable/robust/efficient PC >filesystem, using hashes customized to the application. When I first >read your reply, though, I read it very differently as saying that it >would be easy to build a linux filesystem that actually permits millions >of files per second to be accessed and that this is what Google does >operationally. This is almost certainly true. Typically, the user knows a bit about their application, and can come up with a "good" way to hash or structure the directories/filenames that will have decent performance with the underlying OS filesystem. It's also easy to test with some programs that will generate the zillions of files needed. However, if you are writing software for eventual distribution to others, make sure you explain how you do it, and be aware that other file systems may not see it the same way. Anecdote to illustrate: Back in the late 80s, early 90s, I built a software system which provided a database of around 10,000 industrial real estate properties. The amount of information for each property was highly variable (if, for no other reason than we stored all the historical change information), so I stored all the data for each property in it's own (MS-DOS) file. All those files were stored (originally) in a directory called something like "database". Early testing worked real well, with a data base of a few hundred records (i.e. files), but when we loaded several thousand up, it slowed to a crawl. A bit of experimentation enabled me to figure out where the "breakpoints" in the MS-DOS internal directory caches were, so we could come up with an appropriate set of directories: do you do 100 directories of 100 files, or 10 directories of 1000 files, or 10 directories of 10 directories of 100 files, etc. As it happens we wound up with 100 directories, and hashed based on the low order digits of the property's id number (the numbers being a legacy of the original manual system, and guaranteed unique: most of the time<grin>). All was well, even through many revs.. People would ask about it when they did backups (what are all those directories for?). Enter Novell Networking... Apparently, Netware had it's own scheme for cacheing file directory information on their servers with very different properties from that in MS-DOS, AND, the default installations never contemplated the possibility that someone might have, gasp, 10,000 files that they needed regular access to. Much wailing, gnashing of teeth, and Novell CNEs who needed to go and talk to Novell customer support about how to reconfigure the server (which essentially required reformatting and rebuilding the server, a long and tedious process involves many 5 1/4" floppies to backup the existing contents, etc.) --- so, if you DO implement one of these hashing schemes, document it well. BTW, you're still better off figuring out how to work WITH the OS existing architecture. I built a replacement file system for RSX-11M-PLUS back in the late 70s that supported shared disks, and it was a royal pain to make work. Things like cache concurrency, write behind, and such-like are tricky to deal with.
- Previous message: [Beowulf] since we are talking about file systems ...
- Next message: [Beowulf] since we are talking about file systems ...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
