[Beowulf] RE: Storage - the end of RAID?
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Mon Nov 1 09:37:36 PDT 2010
> I think that the current thinking is that while disks have gotten very large, their I/O performance
> has not kept pace. With the cost of solid state storage coming down in price now, it makes a lot of
> sense to start replacing disks where we have single point bottlenecks in our I/O chain.
Your observation isn't surprising.. over time, density increases (on the disk, on silicon, heck, on punched cards), but the data still has to flow serially through some medium (even if you have a parallel databus, it's not megabits wide), and there, you're limited by EM propagation and all its ills. Speed over a wire, unfortunately, doesn't increase exponentially like density does. It doesn't even get a square law for feature spacing vs areal density.
The other issue that bites you pretty hard is power consumption. Fast data = more transitions = more units of charge moving from one place to another = more IR losses. And that's without looking at the more complex transmitting/receiving hardware needed as you move from pushbuttons and relays to things that have to worry about impedance discontinuities and adaptive equalization.
In many ways, the whole idea of distributed computing is equally applicable to distributed storage, problems and all, just a matter of the scale whether it's registers in the CPU, cache, some level of RAM, or bulk storage.
> So I look at the whole discussion as the realization that finally the rest of the world is catching up
> to us.
In many ways, Beowulfery has helped here.. especially in its early incarnations, the "between node" pipes were pretty slow compared to previous supercomputer designs, so people spent a lot of time figuring out how to structure algorithms so they had good locality of reference and were decoupled at fine time scales. It's sort of the inverse of the classic array processor, systolic array, or even SIMD machines.
I *like* having architectures generically based on message passing rather than shared memory. Programming is harder at first, because you need to explicitly recognize the non-deterministic behavior of the messages, but I think it makes the result design cleaner from an architectural standpoint. It really gets rid of the "global shared variable" thing that is a bane of multithreaded programming. (Of course, if you're coming from a tightly coupled environment with fast semaphores, you find it a pain.. )
More information about the Beowulf