[Beowulf] s_update() missing from AFAPI ?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Tim Mattox tmattox at gmail.comSat Oct 16 15:15:14 PDT 2004
- Previous message: [Beowulf] s_update() missing from AFAPI ?
- Next message: [Beowulf] Re: s_update() missing from AFAPI ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello Andrew (and the Beowulf list as well), You ask some very good questions, and you asked the person who should know the answers. Hopefully my answers below make sense. On Sat, 16 Oct 2004 15:01:34 -0400, Andrew Piskorski <atp at piskorski.com> wrote: > The old 1997 paper by Dietz, Mattox, and > Krishnamurthy, "The Aggregate Function API: It's > Not Just For PAPERS Anymore", briefly mentions > that their AFAPI library also supports, "fully > coherent, polyatomic, replicated shared memory". > It even gives a little chart showing how many > microseconds their s_update() function takes to > update that shared memory. > > That sounds interesting (even given the extremely > low bandwith of the PAPERS hardware, etc.), but, > no such function exists in the last 1999-12-22 > AFAPI release! s_update() just isn't in there at > all. Why? Tim M., I know you follow the Beowulf > list, so could you fill us in a bit on what what > happened there? The s_update() function went away because we changed the underlying implementation of the "asyncronous" s_ routines. The new approach had a hardware limit of 3 "fast" signals, and we deemed that it was best to not hard code any of those for this rarely used shared memory functionality. We had intended to supply a routine that replaced the functionality of s_update() that you could install as one of the 3 signal handlers if you chose to. I'm not sure why that code wasn't released. But, over time, it became a moot point, since the speed of the processors improved so much, that the busy-wait/polling scheme we were using for the s_ routines made it very difficult to get any speedup using the equivalent of the s_update() routine. With the parallel port not actively causing an interrupt, all the nodes had to poll for pending s_ operations. Going from the 486 to the Pentium was a dramatic change on the relative overheads for this polling operation and general computations. Basically, the Pentium and later processors were slowed down so dramatically whenever you would do a single IO space read (the polling function) to see if any pending shared memory operations needed to be dealt with, that it was difficult to get any speedup, even with only two processors. On the testing codes I wrote at the time, it was hard to find the right balance for how frequently to poll. If you polled too frequently, the Pentium was slowed down to a crawl on purely local operations. We speculated that the IO instruction caused a flush of the Pentium's pipeline, but we didn't explore it to great detail. Also, if you polled too infrequently, the shared memory operations were stalled for long periods of time, causing the other processor(s) to sit idle waiting to get their shared memory writes processed. Yes, the performance numbers in the LCPC 1997 paper are measured on a 4 node Pentium cluster, but I don't think we had time yet to play with "real" codes that used the s_update routine on a Pentium cluster. That was a long time ago, so I might not be remembering this part very well. But I do remember that once we had more time to play with it on Pentiums, it was clear that no performance critical codes would be using the s_update routine, much less any of the s_ routines as far as we could tell. So, that is why the s_update routine was pulled from the library, to free up the signaling slot for potentially more useful things. > http://aggregate.org/TechPub/lcpc97.html > http://aggregate.org/AFAPI/AFAPI_19991222.tgz > > While I'm at it I might as well ask this too: > That same old PAPERS papers says "UDPAPERS", using > Ethernet and UDP, was implemented, but it doesn't > seem to be in the AFAPI release either. What > happened with that? The UDPAPERS code was being worked on by a colleague of mine for his parallel file system work, and unfortunately for the rest of us, he only implemented the minimum amount of functionality that he needed for his project, not the full AFAPI. Back in 1999 I had hoped to have time to finish it off myself, but it wasn't my top priority, and if you have followed our work, the KLAT2 cluster in the spring of 2000 brought in some much more interesting new ideas with the FNN stuff. > Did it work? Yes, to some degree, but there were still some important corner cases (certain packet loss scenarios) that hadn't been dealt with, and as I said, the full AFAPI wasn't implemented, just a few basic routines. > As well as the custom PAPERS hardware? No, not as well as the custom hardware. Speaking of which: The custom PAPERS hardware has had some additional work since we last published on it. But due to changing priorities, it has been sitting waiting for the next bright student or two to revive it for more modern IO ports (USB, Firewire, ???). You can see the last parts list and board layouts here: http://aggregate.org/AFN/000601/ Unfortunately, the assembly documentation for that board was never written. It's a "small change" from the PAPERS 960801 board, but enough that if you don't know what each thing is intended for, you might not get it right. That's why we haven't posted a public link to the 000601 board design (until now). We almost made a 12 port version of the PCB, but again, the student involved on that finished their project, and the design hasn't been validated, so it's not been sent out to a PCB fab to be built. As a group we decided it would be better to find students interested in doing a new design that used more modern IO ports than the parallel printer port. Know anyone interested in a Masters project were they have to build hardware that actually works? ;-) Academically, it's hard to make such a thing be for a Ph.D. due to the fact that it's mostly just "implementation/development" at this point, with little "academic" research. > If so, how? Dirt cheap 10/100 cards and UTP cable > would certainly be a lot more convenient than > custom PAPERS hardware for anyone wanting to > experiment with the AFAPI stuff, but I'm confused > about what part of the ethernet network could be > magically made to act as the NAND gate for the > aggregate operations. Yep, no NAND gate in the ethernet... > Did it need to use some particular programmable > ethernet switch? Or the aggregate operations > were actually done on each of the nodes? Yeah, the aggregate operations were actually performed within each node on local copies of the data from all the nodes. The basic idea was to have each node send its new data along with all the known data from anyone else for the current (and previous) operation with a UDP broadcast/multicast. Just this semester we finally have a new student working on a UDP/Multicast implementation of AFAPI... or something like it. They are just now getting up to speed on things, so don't hold your breath. Also, it's unlikely we would actually target a new AFAPI release. With the dominance of MPI, it would only make sense to build such a thing for use as a module for LAM-MPI or the new OpenMPI. I hope this answers your questions, but if not, feel free to ask more. I am busy with my own FNN dissertation work now (plus Warewulf), so I won't be working on AFN/AFAPI/PAPERS stuff to any degree until my Ph.D. is finished. -- Tim Mattox - tmattox at gmail.com http://homepage.mac.com/tmattox/
- Previous message: [Beowulf] s_update() missing from AFAPI ?
- Next message: [Beowulf] Re: s_update() missing from AFAPI ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
