[Beowulf] dedupe filesystem
Lux, James P
james.p.lux at jpl.nasa.gov
Fri Jun 5 10:04:55 PDT 2009
So what we really want is a storage system that will swallow up drives
as they get bigger and bigger - so as your researchers create more and
more data, or stream in more and more satellite/accelerator data/logs
of phone calls (a la GCHQ) then your storage system is expanding at a
Many years ago I read an interesting paper talking about how modern user interfaces are hobbled by assumptions incorporated decades ago. When disk space is slow and precious, having users decide to explicitly save their file while editing is a good idea. (don't even contemplate casette tape on microcomputers..). Now, though, disk is cheap and fast and so are processors, so there's really no reason why you shouldn't store your word processing as a chain of keystrokes, with infinite undo available. Say I spent 8 hours a day doing nothing but typing at 100wpm.. That's 480 minutes * 500 characters/minute.. Call it a measly 250,000 bytes per day. Heck, the 2GB of RAM in the macbook I'm typing this on would hold 8000 days of work. In reality, a few GB would probably hold more characters than I will type in my entire life (or mouse clicks, etc.)
In theory, then, with sufficient computational power (and that's what this list is all about) with the data on a small thumb drive I should be able to reconstruct everything, in every version, I've ever created or will create. All it takes is a sufficiently powerful "rendering engine"
I readily concede that much data that is stored on computers is NOT the direct result of someone typing. Imagery is probably the best example of huge data that isn't suitable for the "base version + all diffs" model.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf