[Beowulf] High Performance for Large Database
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Oct 27 11:54:04 PDT 2004
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 27 Oct 2004, Mark Hahn wrote: > > NONtrivial parallelizations are things like distributing the execution > > of actual SQL search statements across a cluster. Whether there is any > > it's easy to imagine that a stream of SQL queries could actually > be handled in sort of an adaptive data refinement manner, where most > of the thought goes in to managing division of the query labor (distributed > indices searched in parallel, etc) , and in placement of data (especially > ownership/locking of writable data). I have no idea whether Oracle-level DB's > do this, but it wouldn't surprise me. the irony is that most of the thought > that goes into advanced Beowulf applications is doing exactly this sort of > labor/data division/balancing. > > I'd hazard a guess that the place to start putting parallelism in a DB > is the underlying isam-like table layer... As always, google is your friend. parallel database algorithms turns up lots of current work; I'm sure a look at specific open source projects would turn up more (and maybe more relevant) work. Some of the tools I turned up in my short former query do exploit the kind of simple read-only data parallelism you described, though, and wrap it up all pretty. For small read only databases (backing e.g. a website), the very simplest approach is likely to put a full copy of the DB on each server and distribute the transactions themselves round robin. Use rsync to periodically update the images to accomodate distributed changes, if you permit distributed write and you dare (merging in changes is nontrivial). Or write an engine that uses idle time of the node engines themselves to distribute inserts to be scheduled. Google itself is a pretty good example, actually. Complex searches of a read-only and truly encyclopediac database. All I really know is that this is real computer science, and I am only an amateur at best. I suspect that large scale database parallelism is the subject of much current algorithmic research, as parts of the problem are likely NP complete or at least NP hard. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
