Beowulf Questions
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caMon Jan 6 16:33:23 PST 2003
- Previous message: Beowulf Questions
- Next message: Beowulf Questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Anywho, I was thinking that the lib call was written in an asynchronous > fashion, with various flags being set on the root node when a compute > node completed its computation. Also, the only way the root would > continue on with the application is when all nodes sent a response > saying that they're done. well, that means the master becomes a potential bottleneck. also consider what happens if the master fails... > verbatim and stored off to disk. That is, if the recovery node didn't > finish its work already, of course. You'd also have to tell the original > node that straightened itself out "Never mind," of course. (Said with that's fine if each node has only trivial globally unique state. but often, the reason you're using parallelism at all is because you have a huge amount of global state, and each of N nodes owns 1/N of it. can your program somehow survive when 1/N of its state disappears? some codes don't have a lot of state. for instance, suppose you were doing password cracking - a node's state is just its assigned subspace within the set of possible cleartext passwords. if it dies, just hand the space to some other node or distribute it among the survivors. if your problem is like that, you're utterly and completely golden - not only can you handle failures easily, but you can also run just fine on a grid. like prime-cracking, seti at home, etc.
- Previous message: Beowulf Questions
- Next message: Beowulf Questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
