[Beowulf] Threaded code
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Aug 17 10:44:51 PDT 2004
- Previous message: [Beowulf] Threaded code
- Next message: [Beowulf] Threaded code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 17 Aug 2004, Joe Landman wrote: > Mark Hahn wrote: > > >>variables? (do you have an NCPU=1 or something like that hanging around?) > >> > >> > > > >afaikt, when threads are enabled, atlas compiles in the number of threads, > >based on what it detects on the machine doing the compilation. so, for > >instance, if you happened to compile atlas on this machine with the uni > >kernel, (or some other uni) you'd get no threads. this is a bit > >counterintuitive to anyone used to OMP_NUM_THREADS, but it certainly > >makes sense for atlas. > > > > > > > Ok, I haven't used atlas in a while. Are you saying that it hardcodes > the number of processors into the code itself? Wouldn't this > effectively render binary RPMs of Atlas completely useless? Would also > make building static binaries (don't know if it is possible with Atlas > libs) a waste of time if you need a portable binary. The whole point of ATLAS is that it is Automatically Tuned. The fundamental flaw in the tuning process is that it is entirely (AFAIK) build-time tuning -- it literally does a search in a high-dimensional parameter space for build-time parameters that yield optimal performance, and then compiles them right into the application. It isn't designed to be portable at all -- quite the contrary. It is designed to be built on EACH system on which it is to be used, and if you happen to have an "identical" system on which you want to copy the result well, maybe it is identical and maybe it isn't that's up to you. So sure, ATLAS is packaged up but it really needs to be built and packaged on a per-system-archetype basis, and doing even this sort of voids the warranty (so to speak) that the installed library is truly optimal on any system but the original RPM build system as even small differences can push one across the carefully adjusted superlinear speedup/slowdown thresholds preset in the library. I'd like to see ATLAS redesigned so that it Automatically Tunes at runtime, not build time, so that it becomes moderately portable. Not enough to actually do the work, mind you;-) but I think that it is possible at only a small cost in overall efficiency and even have an idea how to go about doing it. I've been thinking about proposing it as a project for some of the upper level CPS students here -- there is an independent study course where this sort of thing is tackled and this is an ideal project for the course. > I have had codes that spent very little time in the parallel sections in > the past. Simply adding another processor/thread does not automagically > half the run-time. You would need to use some of the more advanced > query tools to see what is going on. Hmmm, given that "top" or "ps" are more advanced query tools, I agree. However, it shouldn't be horribly difficult. It isn't clear to me that ATLAS is multithreaded anyway. Does anybody know for sure? It has been a while since I looked at the code. So the only way you might see an SMP speedup is probably to run two instances of the application and observe that they complete in the same time as one, not run one instance of the application and see that it completes in half the time as on a single CPU system. rgb > > Joe > > >regards, mark hahn. > > > > > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Threaded code
- Next message: [Beowulf] Threaded code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
