availability of Memory compression routine
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduFri Jul 19 06:32:52 PDT 2002
- Previous message: availability of Memory compression routine
- Next message: E7500 Chipset, update
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Another suggestion: From the sound of things, the master is doing very little "work" except for farming out the base matrix for each node's next unit of work. You also don't indicate whether the work sequence is independent or dependent (does the second matrix sent out depend on the results sent back by all the nodes) or synchronous (so that all the nodes have to finish at a barrier before the next work slice can be begun). It could be that the base matrix isn't horribly difficult to compute and is independent for each slice OR that you have a big file on disk of base matrices and are reading them in and sending them out one at a time. In either case you should likely reconsider your code engineering. One secret to happy parallelization is to try to push your code structure towards the "ideal" of embarrassingly parallel. You should therefore sit down and think carefully about the architecture of the problem and whether it is possible to reformulate it (possibly in an "odd" way) so that it is more nearly EP. You haven't indicated what the actual physics is of the problem you're working on, but if the master is computing the matrices it is sending out (perhaps generating them from a distribution and a random number seed) and if it takes signficantly less time to compute a new matrix than to send it via the net, you can very likely restructure your problem so that it is EP, or nearly so. For example, if your slaves had onboard the RNG and initial matrix computation routines, all the master would need to do is send all the slaves a starting RNG seed (once, at the very beginning) and an index indicating where a given slave should start generating its next matrix. The master would basically do trivial bookkeeping -- send out just enough information for the slaves to do ALL the work and then kick back and await results. A second, even more effective way to make the problem EP exists if you have large, independent blocks of parameter space that you are interested in exploring. Simply divvy it up among the nodes and let each node run the job as a SERIAL task. Node 1 runs the job(s) you were going to do on Monday. Node 2 does the ones you were going to do on Tuesday. Etc. This gives nearly perfect parallel speedup unless (again) the data required to initialize a day's work is really, really big. This will also work even if a given work sequence for a given day has dependencies, so that the second matrix sent out depends on the results sent back from all the nodes from the first one. Remember, although PVM and MPI are indeed very cool, if you write code that REQUIRES their use you have in some sense already lost the parallel wars, moving from EP or extremely coarse grained parallelism (which scales well on nearly any network architecture) into medium to fine grain parallel, which requires fairly careful thought and co-engineering of task and beowulf to achieve the best possible scaling. Master/slave computations can often be engineered to be EP, and a proper account of their parallel speedup must include the serial time required to generate or read in the matrices you are working on. Just imagine that you have just one computer to do the computations on. Now somebody gives you a second, but it isn't on the network at all -- it has maybe a cd drive you can use to move code onto it and perhaps read in a big block of initialization data. Can you use it constructively? Can you crank up a SERIAL computation on it that does computations that are truly independent of those on the first, but still mean that you finish all the work required for your publication in half the time? If so, you win -- architect your parallelized task the same way, with the convenience of a network instead of a CD drive and you should be able to achieve near-linear speedup for LOTS of nodes. Even if the matrices are being read from disk, they had to get to disk somehow. If they were sent there from, e.g. an experiment with some sort of network copy, it would be a simple matter to arrange for them to get distributed directly to a scratch space on the nodes (disk is so cheap nowadays it is very likely that each node has over 30 GB of scratch disk anyway). This would significantly improve execution time -- obviously it is "bad" to send the matrices first to the master, which then sends them to the slaves (two network copies), instead of sending them directly to the slaves (one network copy). Again, the master's role is reduced to bookkeeping and collecting (hopefully much smaller) results. The only way you're likely to get stuck is if you're doing a single, monolithic computation with complete dependencies and synchronicity (so there is just a single thread of matrices you wish to explore and every new matrix is generated from the results computed by all the nodes in the previous step,) or if the matrices are e.g. on tapes being sent to you from Fermilab or the like, in which case you've got a nasty double bottleneck to consider -- the tape interface (slow) and the network (also slow). In the first the best you can do is speed up the network. In the second you're up against Amdahl's Law here, generalized to account for serial and parallel fractions of work. Reading from tape is a serial bottleneck that won't go away. A high end tape has a data transfer rate on the order of 6 MB/sec sustained, plus seek times which can easily be seconds or longer. Clearly the network (at 9-10 MB/sec) is no longer a bottleneck, so going to gigabit ethernet would not help overall. In any event, you definitely need to study your problem carefully, especially the bottlenecks, to see how to redesign it to run efficiently on large numbers of nodes. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: availability of Memory compression routine
- Next message: E7500 Chipset, update
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
