[Beowulf] since we are talking about file systems ...

Vincent Diepeveen diep at xs4all.nl
Wed Jan 25 14:50:16 PST 2006

Of course as a big search expert i have some opinions here on searching the 

In computerchess we use an O(1) hash lookup technique of course.
It's called Zobrist. Basically calculate a number by xoring for every piece 
at every square a random number.

Hashtable in parallel search i organize in a similar way. For example with 4 
processors you simply split the key in 4.

Now you can do that with a slow modulo instruction, but please consider 
there is ways to get rid of slow floating point code here as well as even a 
slow modulo instruction.

You can also multiply the hashbits with a number and shift that right 32 

Like this :
    unsigned int
    procnr = ((((unsigned 
    hindex = (unsigned 
    hentry = &(globaltrans[procnr][hindex]);

where hashpos.lo is a 64 bits Zobrist index of what we want to obtain
nprocesses is the number of memory nodes we are searching at
abmod is the number of entries 1 node stores in RAM.

That simple math automatically gives the node 'procnr' at entry location 
'hindex' where you want the relevant information.

So there is no O( log n) necessary nor slow instructions. Lookup is 
completely integer based and in case of a WWW technique
a bunch of pc's, the more the better, will simply perform faster. It's 
embarrasingly parallel.

I'm not sure how google is doing their indexing, it's probably done in a 
similar way,
yet this is trivial to write embarassingly parallel for a WWW searcher.

Of course the total bandwidth you want to the internet is not so trivial to 
How google could get that bandwidth is still a mystery to me.

Overwriting goes in a similar way. We simply accept the fact that there is 
more information than we can index. That's in chess also the case,
a processor can search billions of positions a game (at my dual opteron dual 
core 2.4Ghz), yet i need 16 bytes for 1 position
and i have only 1 GB of RAM, so do the math the loading factor is far bigger 
than 1, which means for sure we are going to lose data..

You just find an overwrite strategy and optimize for that, old information 
simply slowly gets overwritten and overwritten a bit more than newer 
information gets overwritten, that's all.

If you lose information, no big deal.

The issue that google needed to adress i guess is dying nodes. In 
computerchess we prefer to not take those into account as that would 
complicate things even more.

So with real cheap material embarrassingly parallel hardware and very 
expensive bandwidth it's pretty easy to make a new google
from software viewpoint. Now try to get the users and then get on Wallstreet 

That last line is the hardest :)

Web indexing just like search in chess you can make just as hard as you want 
to but the basics of it is pretty simple.
Only O( 1 ) is acceptible for hashlookups :)

Please realize that a big supercomputer for a chessprogram similar amounts 
of accesses a second to hashtable get done like the total amount of queries 
in google a second.

Of course as much as possible gets done in RAM. That's a matter of 2 layers 
of caches. Effectively you hardly need queries from harddisk.

You can simply tune the system such that your latency doesn't go sit in 
harddisk queries. This is pretty complicated, in chess too.
We have a terabyte of data on disk we can query during search, so you have 
to tune it such that you are busy 99% of system
time in RAM and not with harddisk :)

Already years ago there was a Russian paper from an important Russian 
researcher published in computerchess showing the similarity between 
websearches over the internet and chess search.

There is of course big differences also. A big difference is that in chess 
we assume a node never dies. Google *has* to do that. In chess
there is more than just hashtables.

Important bottomline is to consider a lossy approach.
As soon as you want a lossless approach then things get real real 

All kind of database approaches which store everything also on disk in a 
lossless way before doing anything else
are therefore not so extremely interesting to compare with a lossy search 
for information or a lossy store of information.


----- Original Message ----- 
From: "Dan Stromberg" <strombrg at dcs.nac.uci.edu>
To: "PS" <pesch at attglobal.net>
Cc: <Beowulf at beowulf.org>; "Robert G. Brown" <rgb at phy.duke.edu>
Sent: Wednesday, January 25, 2006 2:40 AM
Subject: Re: [Beowulf] since we are talking about file systems ...

> You've struck a topic I've been very interested in lately:
> indexing/search.
> I've been researching available FLOSS index/search software lately, and
> have put some time into an indexing engine of my own, named pyindex.
> The results of what I've gathered so far are at:
> http://dcs.nac.uci.edu/~strombrg/Open-Source-desktop-search-applications.html
> And pyindex, while it's not complete, and I don't have a web page about
> it yet, has been able to index over 40,000 messages in well under an
> hour, and typically does searches in well under a minute - despite being
> in python, which is the purportedly "too high level language".  :)
> I may have to give up on bsddb, gdbm and similar lightweight database
> backends to get much better.  In fact, just switching from gdbm to
> dbhash (part of bsddb) sped it up a lot.
> If you have anything to add to that URL, I'd really be stoked to hear
> about it.
> On Wed, 2006-01-25 at 01:59 -0800, PS wrote:
>> You're absolutely correct, of course, in what you state: Google doesn't
>> access millions of files in a split second. But Google finds one file in
>> millions in a split second.
>> I still think, though, that indexing makes sense on a PC - with
>> practically any underlying file system. More than ten years ago I used a
>> large  (for that time) database of about 200 million files - which was
>> CDROM based; the file system was PC Dos and the indexing was done on top
>> of that in a sort of front end.  To find/access any one of the 200 mio
>> files on the slow CDROM took never longer than 0.8 seconds. So, we
>> didn't touch the "kernel" of dos at all and still got fast results. I
>> think a similar approach should work on a Linux system. In my opinion
>> this could and should be done outside of the kernel. What do you think?
>> Paul
>> Robert G. Brown wrote:
>> > On Sun, 22 Jan 2006, PS wrote:
>> >
>> >> Indexing is the key;  observe how Google accesses millions of files
>> >> in split seconds; this could easily be achieved in a PC file system.
>> >
>> >
>> > I think that you mean the right thing, but you're saying it in a very
>> > confusing way.
>> >
>> > 1) Google doesn't access millions of files in a split second, it AFAIK
>> > accesses relatively few files that are hashes (on its "index server")
>> > that lead to URLs in a split second WITHOUT actually traversing 
>> > millions
>> > of alternatives (as you say, indexing is the key:-).  File access
>> > latency on a physical disk makes the former all but impossible without
>> > highly specialized kernel hacks/hooks, ramdisks, caches, disk arrays,
>> > and so on.  Even bandwidth would be a limitation if one assumes block
>> > I/O with a minimum block size of 4K -- 4K x 1M -> 4 Gigabytes/second
>> > (note BYTES, not bits) exceeds the bandwidth of pretty much any 
>> > physical
>> > medium except maybe memory.
>> >
>> > 2) It cannot "easily" be achieved in a PC file system, if by that you
>> > mean building an actual filesystem (at the kernel level) that supports
>> > this sort of access.  There is a lot more to a scalable, robust,
>> > journalizeable filesystem than directory lookup capabilities.  A lot of
>> > Google's speed comes from being able to use substantial parallelism on 
>> > a
>> > distributed server environment with lots of data replication and
>> > redundancy, a thing that is impossible for a PC filesystem with a 
>> > number
>> > of latency and bandwidth bottlenecks at different points in the 
>> > dataflow
>> > pathways towards what is typically a single physical disk on a single
>> > e.g.  PCI-whatever channel.
>> >
>> > I think that what you mean (correctly) is that this is something that
>> > "most" user/programmers would be better off trying to do in userspace 
>> > on
>> > top of any general purpose, known reliable/robust/efficient PC
>> > filesystem, using hashes customized to the application.  When I first
>> > read your reply, though, I read it very differently as saying that it
>> > would be easy to build a linux filesystem that actually permits 
>> > millions
>> > of files per second to be accessed and that this is what Google does
>> > operationally.
>> >
>> >    rgb
>> >
>> >>
>> >>
>> >> Paul
>> >>
>> >>
>> >> Joe Landman wrote:
>> >>
>> >>> Methinks I lost lots of folks with my points ...
>> >>>
>> >>> Major thesis is that on well designed hardware/software/filesystems,
>> >>> 50000 files is not a problem for accesses (though from a management
>> >>> point of view it is a nightmare).  For poorly designed/implemented
>> >>> file systems it is a nightmare.
>> >>>
>> >>> Way back when in the glory days of SGI, I seem to remember xfs being
>> >>> tested with millions of files per directory (though don't hold me to
>> >>> that old memory).  Call this hearsay at this moment.
>> >>>
>> >>> A well designed and implemented file system shouldn't bog you down
>> >>> as you scale out in size, even if you shouldn't.  Its sort of like
>> >>> your car.  If you go beyond 70 MPH somewhere in the US that supports
>> >>> such speeds, your transmission shouldn't just drop out because you
>> >>> hit 71 MPH.
>> >>>
>> >>> Graceful degradation is a good thing.
>> >>>
>> >>> Joe
>> >>>
>> >>>
>> >>
>> >
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list