[Beowulf] best archetecture / tradeoffs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduSun Aug 28 06:49:44 PDT 2005
- Previous message: [Beowulf] best archetecture / tradeoffs
- Next message: [Beowulf] best archetecture / tradeoffs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn writes: (That my answer was off the Mark -- and sure, he's probably right...;-) >> There are, however, ways around most of the problems, and there are at >> this point "canned" diskless cluster installs out there where you just >> install a few packages, run some utilities, and poof it builds you a >> chroot vnfs that exports to a cluster while managing identity issues for > > canned is great if it does exactly what you want and you don't care to > know what it's doing. but the existence of canned systems does NOT mean > that it's hard! ... >> Seriously, if you want these features (especially in perl) you have to >> program them all in, and none of them are terrible easy to code. The > > I would have to say that the difficulty of this stuff is overblown. > getting a node to net-boot onto a RO nfs root is pretty trivial if you > know what you're doing, *or* if you look at a working example (LTSP, etc). > writing a daemon to field job requests is basically cut-and-paste. > combining that with a bit of SQL code is surprisingly simple. > every design decision has its drawbacks (including using SQL), but you just > have to try it to know whether it is worth it in the end. for instance, > I chose SQL in part because it let me ignore challenges of a reliable > data store, let me scale extremely well, and ultimately, all the job info > had to be in a DB eventually any way... Hmmm. I would personally invert this. Canned is great if you have no idea what you are doing and would like to learn without infinite pain (as are working examples:-). Doing it by reading man pages, HOWTOs, and guessing is (to mix up a tasty metaphor) sort of like learning to remove your own appendix using a steak knife and a mirror from Amazon prequels of medical textbooks.. I only say this because I've been running diskless or partly diskless systems off and on for oh, going on 20 years now and am horribly scarred by some of the experiences had along the way:-) What I think IS a safe statement is that it has never been EASIER to run linux diskless than it is right now, particularly with one of the (e.g. warewulf) kits. As in with yum and a bit of chroot action, one can now build an exportable root in a minute or two from a script, and I can't BEGIN to tell you how painful this used to be back in pre-yum days. I did it the hard way as lately as maybe six years ago, and it was still a bloody PITA. I think it is a lot easier now and certainly is much better documented, but it is also certainly not trivial -- unless you use a kit/package, in which case you can skip a whole lot of the learning curve in the SHORT run because it is all encapsulated so (if nothing else) you can learn it quickly, efficiently, and in a functional context. Remember, Mark, you're a bit of an uberprogrammer. Not everybody would find mixing a task-forking perl daemon with a "bit of SQL code" surprisingly simple...the kind of project one could knock off in a weekend;-) Not everybody would just say "screw all the nasty old crack-ridden schedulers out there, gonna write my own." and actually do it. I started to do it myself once upon a time, and simply didn't have time to finish it even without SQL -- and transitioned to C part way through because the perl started to get a bit "messy". In fact, a lot of people who administer very successful clusters and even code a lot don't even KNOW SQL, and learning SQL and a particular interface (e.g. mysql) well enough to usefully code it into even a simple application is a task that (I recall) optimistically takes a week or two right there -- if you are an uberprogrammer, of course. Normal humans sometimes take semester-long courses in it to get to where they can use it with any sort of proficiency. Or even two or three semester long courses. So I think your "simple" really translates to "in a few months of learning and steady work" for most normal admin/programmers... unless they use one of the cluster packages to short circuit the process and leverage their own work and contributions for the greater good, in which case they could likely be functional in a week and pretty well tuned in two, focussing on the application instead of the cluster. >> Reliability of ANY sort of networking task (run through perl or not) -- >> well, there are whole books devoted to this sort of thing and it is Not >> Easy. Not easy to tell if a node crashes, not easy to tell if a network >> connection is down or just slow, not easy to restart if it IS down, >> especially if the node drops the connection because of some quirk in its >> internal state that -- being remote -- you can't see. Not Easy. > > depends. I typically make sure my clusters have pretty reliable nodes > and networking - it just doesn't seem to make sense to patch around > unreliability. given that, I don't think this is hard at all. > >> OTOH, if you're using perl as a job distribution mechanism, you have >> lots of room to improve without resorting to scyld, and you can always >> try bproc (which is a GPL thing that is at the heart of scyld, or used > > actually, a stripped down rsh seems to be about the same speed as bproc. OK, didn't know that (I've only benchmarked the out-of-the-box rsh, which is fast but not THAT fast), but it makes reasonable sense. However, I was referring him to bproc, scyld, warewulf, etc thinking of the higher level job control and cluster information that the "canned" solutions provide, not just the distribution interface. Really, as I pointed out, for most sensible task partitionings with a good compute to startup/communications ratio, choice of remote shell is irrelevant anyway -- out at the <1% level in terms of its impact on efficiency. Only when task startup takes as long as the task are you in trouble, and then the problem is obvious and the solution usually isn't to do more efficient startup, it is to reorganize the task to do more work PER node startup. A "worker daemon" approach can pay node startup costs a single time per GLOBAL task startup and thereafter just manage the IPCs required to initiate the next work unit and return the results -- no new shells, forks, stats, everything already resident in memory and just awaiting the next unit. As far as load balancing and reliability go -- remember that there do exist parallelized tasks that aren't fault tolerant so that the failure of any node causes global failure of the job. There also exist task distribution schemes that (especially on heterogeneous clusters) will leave nodes idle. Unless the application is written carefully so that either or both don't happen, if it CAN be so written. In some of these cases fixing everything up edges off into computer science and a one-off solution based on the particular task at hand, which is all I as saying. Still, I think you're right, I was a bit too casual with a lot of my remarks in my reply. I really wasn't trying to dis diskless (I actually like it too and have used diskless systems for a long time now when appropriate or necessary). I was just trying to inject a note of caution that "doing it yorself" it is almost certainly a bit MORE difficult than just doing a kickstart install to a diskfull system (which actually does so WITH diskless configuration, but using canned tools), and that there were certain DIY issues with RO vs RW -- /var, /var/tmp, /tmp if shared RO root and things you couldn't comfortably do on the diskless systems that would bite you if you try shared RW root. These things have been there a long time -- back with Sun SLC's and ELC's, back with lots of early ways to boot up linux diskless -- and YES there are solutions, but one should try hard not to reinvent these wheels because the process can be somewhat painful. There are now ways around most/all of the problems, and as you correctly note if he looks around at some examples he can make it work WITHOUT having to figure it out "the hard way". Finally, what I should have said in my discussion of reliability was much more what you focussed on. If his task is really CPU intensive (bound) then there "should" exist master/slave task organizations that are more or less automagically load balancing (by simply keeping each node working at 100% all of the time) and "reliable" (where one has to do a bit of work here, but one can e.g. preserve job data sent to any node until it returns successfully and if the node goes down, redirect it to the next idle node and so on). This is still as task organization permits -- a BIT tricky if the work units all have to be done in some particular order or form some sort of transformation chain, but probably doable, using any of the example tasks that parallelize Mandelbrot set generation (e.g.) as templates. So I apologize. Not a terribly good answer last time. rgb (P.S. -- my replies will probably only get wonkier with time over the next week. I start teaching again on Monday...;-) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20050828/c265cbe5/attachment.bin
- Previous message: [Beowulf] best archetecture / tradeoffs
- Next message: [Beowulf] best archetecture / tradeoffs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
