[Beowulf] precise synchronization of system clocks
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Lux, James P james.p.lux at jpl.nasa.govTue Sep 30 06:54:20 PDT 2008
- Previous message: [Beowulf] precise synchronization of system clocks
- Next message: [Beowulf] precise synchronization of system clocks
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 9/30/08 2:53 AM, "Vincent Diepeveen" <diep at xs4all.nl> wrote: > Hmm, > > 1 uS accuracy whereas the cpu has a hardware counter for all this. > > To be honest i find 1 microsecond very inaccurate now that cards have > latencies near that. <snip> > Doing that a couple of thousands of times, we should get a fairly > accurate > timing in B, far more accurate than 1 microsecond, as the deviation in > one way pingpong latency isn't real big. It's quite constant. Unfortunately, it doesn't work out that way. The distribution of times is not nicely distributed, so it actually needs more statistical processing to come up with the delay (for instance, isn't what you really want the minima, not the mode or median). One could use such a scheme, with sufficient processing, to measure the difference between the clock frequencies. If you can control the actual packets on the wire so they are all identical, and they're the only packets, that helps too. > > Only the deviation of that latency is a measure for the accuracy at > which you can > synchronize the clocktime. > > Now this is a simple 2 node example. It is of course possible for a > cluster to use > the measurements of many nodes and synchronize to that, just like the > coordinate calculation > for GPS uses several satellites. Using many nodes that'll get the > average > error down. Of course to synchronize many nodes each node uses its > own clock as > new 'source' of measurement; if for the synchronization accuracy we > always assume the > same clock from node A, then getting the error down is a lot tougher. The GPS synchronization problem is actually substantially easier. The propagation delay from satellite to receiver is varying in a very predictable manner (in fact, the nav solution solves for it); the signal is specifically designed for accurate timing (i.e. A PN code generated from a Cs clock is a darn good way to transmit timing and frequency information) The challenge in synch over Ethernet (without added hardware a'la IEEE-1588) is that simple NTP style ping ponging rapidly gets you to where the measurement uncertainty is comparable to the uncertainty and variability of the exceedingly cheap oscillators on the NIC. FWIW, in the lab, most people would not be satisfied with sync to 1 microsecond (after all, how many thousand instructions is that?). You could probably get to 1 microsecond with running a wire between serial ports (Hook to the IRQ off the Ring Indicator, for instance) You want to think in terms of nanoseconds, and no straight ethernet scheme without added hardware can get there. Keeping it beowulf'y, if you want fine grained synchronization so that you don't lose performance when doing barriers, you're probably going to need some sort of common clock. The typical microprocessor crystal just isn't good enough. Actually, though, when talking about this sort of sync, aren't we getting close to SIMD sort of processing? Is a "cluster of commodity computers" actually a "good" way to be doing this sort of thing? Jim Lux > > Vincent > > > On Sep 29, 2008, at 11:21 PM, Lombard, David N wrote: > >> On Mon, Sep 29, 2008 at 01:10:49PM -0700, Prentice Bisbal wrote: >>> In the previous thread I instigated about running services in cluster >>> nodes, there was some mentioning of precisely synchronizing the >>> system >>> clocks and this issue is also mentioned in this paper: >>> >>> "The Case of Missing Supercomputer Performance: Achieving Optimal >>> Performance on the 8,192 processor ASCI Q" (Petrini, Kerbisin and >>> Pakin) >>> http://hpc.pnl.gov/people/fabrizio/papers/sc03_noise.pdf >>> >>> I've also read a few other papers on the topic, and it seems you >>> need to >>> sync the system clocks to ~1 uS. On top of that, I imagine you >>> also need >>> to synch the activities of each system so they all stop to do the >>> same >>> system-level tasks at the same time. >> >> The IEEE-1588 "Precision Time Protocol" can provide such levels of >> global clock >> synchronization. >> >> Shameless plug: See "Hardware Assisted Precision Time Protocol >> (PTP, IEEE-1588) >> - Design and Case Study" presented at the recent LCI conference; >> <http://www.linuxclustersinstitute.org/conferences/archive/2008/ >> technicalpapers.html> >> >> -- >> >> David N. Lombard, Intel, Irvine, CA >> I do not speak for Intel Corporation; all comments are strictly my >> own. >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: [Beowulf] precise synchronization of system clocks
- Next message: [Beowulf] precise synchronization of system clocks
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
