[asterisk-users] hardware clock drift and CDR
Vieri
rentorbuy at yahoo.com
Mon Apr 26 13:13:09 CDT 2010
--- On Mon, 4/26/10, Gordon Henderson <gordon+asterisk at drogon.net> wrote:
> > --- On Sun, 4/25/10, Gordon Henderson <gordon+asterisk at drogon.net>
> wrote:
> >
> >>> Hi,
> >>>
> >>> I've noticed that one of my new servers (new
> mobo) if
> >> drifting slowly
> >>> backwards in time (in aprox. 24 hours, system
> time
> >> drifts back 5
> >>> minutes).
> >>>
> >>> I have an ntpd process which is supposed to
> sync with
> >> a lan time server
> >>> but it's not quite working. So I'm launching a
> manual
> >> ntpdate or
> >>> ntp-client once an hour and that seems to
> work.
> >>
> >> If you can run ntpdate and it sets the time, then
> you are
> >> not running
> >> ntpd. The 2 can not run at the same time.
> >
> > Hi Gordon,
> >
> > Are you sure about this?
>
> Yes.
>
> >ntpd is a daemon and adjusts the time in a continuous
> manner. ntp-client
> >or ntpdate or whatever are one-time clients that reset
> the system clock.
> >I don't see why an ntp-client can't be run while ntpd
> is working (it
> >shouldn't be necessary but may come in handy when the
> time difference is
> >big and ntpd refuses to sync).
>
> ntp binds to the ntp port (123) and prevents anything else
> binding to it,
> or listening on it - which ntpdate needs to do.
>
> Example here:
>
> Desktop is running ntpd:
>
> yakko:/home/gordon# ps ax | fgrep ntp
> 22064 ?
> Ss 0:14 /usr/sbin/ntpd -p
> /var/run/ntpd.pid -u 106:107 -g
> 30340 pts/29 R+
> 0:00 fgrep ntp
>
> I try to run ntpdate:
>
> yakko:/home/gordon# ntpdate
> essen.drogon.net
> 26 Apr 14:20:47 ntpdate[30341]: the NTP
> socket is in use, exiting
>
> > Anyway, I've noticed that my ntpd log messages don't
> say "anything" when
> > trying to sync to my "Windows PDC LAN time server".
> Curiously,
> > ntp-client DOES sync to this Windows server.
>
> > So I decided to sync to pool.ntp.org and now I see
> syslog messages that
> > actually show that the system time gets adjusted by
> ntpd.
> >
> > I'd rather sync to my LAN time server but this is
> off-topic on this ML.
>
> Using pool and your LAN server would be the best way
> forward - there are
> pool server avalable for most countries too, so
> us.pool.ntp.org,
> uk.pool.ntp.org, and so on.
>
> Your /etc/ntp.conf file can be very simple indeed - my
> workstation one is
> nothing more than:
>
> server essen.drogon.net
> server uk.pool.ntp.org
>
> You can check your servers ntp daemon with:
>
> ntpq -c peers
>
> and
>
> ntpq -c rl
>
> The key thing to look for in the 'rl' command is 'stratum'.
> If it's 16
> then it's not synchronised and anything less than 16 is
> good.
>
> yakko:/home/gordon# ntpq -c rl | fgrep
> stratum
> processor="i686",
> system="Linux/2.6.29.2", leap=00, stratum=4,
>
> Don't get too hung-up on how close to zero the stratum is.
>
> >>> How does Asterisk CDR count the
> duration/billsec
> >> values? Does it rely on
> >>> system time ONLY for "call start" or also for
> "call
> >> end"?
> >>>
> >>> What Asterisk-related side-effects should I
> expect
> >> from a drifting
> >>> clock?
> >>
> >> Who cares. Just fix ntpd then your worys are
> gone.
> >
> > Well, I still have doubts about that. I could look at
> * source code but
> > I'd rather hear from someone here.
>
> Might be easier to read the code ;-)
>
> > My ntp log shows this:
> >
> > 26 Apr 13:06:30 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 13:21:24 ntpd[534]: time reset +2.318647 s
> > 26 Apr 13:21:44 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 13:37:46 ntpd[534]: time reset +2.325417 s
> > 26 Apr 13:38:06 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 13:54:11 ntpd[534]: time reset +2.327974 s
> > 26 Apr 13:55:19 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 14:09:16 ntpd[534]: time reset +2.177572 s
> > 26 Apr 14:10:08 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 14:26:07 ntpd[534]: time reset +2.357017 s
> >
> > That kind of scares me because if I'm not mistaken it
> means that about
> > every 20 seconds, my ntpd adjusts the system time by
> about 2 seconds
> > forward. So my clock is going back 2 seconds every
> 20... That's a
> > significant drift. And it would definitely make a
> difference in my CDR
> > records IF Asterisk were to compare the "start and
> end" system times.
> >
> > Should I worry about this?
>
> If ntpd can't keep the kernel time in-sync then it will
> step abput every
> 900 seconds - which is what appears to be happening here.
> (the intervals
> are typically much longer than 20 seconds - e.g. 13:06:30
> to 12:21:24 is
> ~15 minutes - 900 seconds.
>
> I don't think I've ever had a server a bad as that before,
> so have never
> looked further... Still, it's 2 seconds in 900 seconds, not
> 2 in 20 as you
> thought.
>
> Which I think is odd - the Linux clock is software derived
> based on a
> hardware interrupt - it only consults the hardware
> battery-backed clock at
> boot time (and is supposed to write the current time to it
> at shutdown
> time) so I wonder if your server is missing interrupts, or
> otherwise
> mis-behaving.
>
> Is there anything else odd in the log-files?
I ran the following and it supposedly updated my system time while ntpd was running:
# ps ax | fgrep ntp
1256 ? Ss 0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -u ntp:ntp
1623 pts/14 S+ 0:00 fgrep ntp
# ntpdate -b -u pool.ntp.org
26 Apr 19:41:18 ntpdate[2791]: step time server 163.117.131.239 offset 0.142263 sec
By the way, as a side question, on another server I see this:
# ntpq -c peers
remote refid st t when poll reach delay offset jitter
==============================================================================
inf-srv1.hospit .LOCL. 1 u 56 64 377 0.314 21755.8 7.634
Not sure what LOCL means but I'll refer to the NTP docs (inf-srv1 is my LAN Windoze time server).
Anyway, back to the faulty new server (which reports a stratum of 3 after ntpd has been running for a while and sync'ing to pool.ntp.org):
it's supposed to be a good motherboard (Asus) but I'm running a relatively "old" kernel (2.6.23).
Googling around suggests me to try to boot with "noapic" if I keep seeing my clock drift so much.
# more /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 103 0 0 1 IO-APIC-edge timer
1: 2151 0 0 9 IO-APIC-edge i8042
4: 12772543 13217932 9603064 7661766 IO-APIC-edge serial
8: 1 0 1 0 IO-APIC-edge rtc
9: 0 0 0 1 IO-APIC-fasteoi acpi
12: 0 0 0 4 IO-APIC-edge i8042
14: 2234 73664 0 2470 IO-APIC-edge ide0
16: 28322780 51914617 40744985 39615361 IO-APIC-fasteoi eth0
17: 63242610 42157366 43790794 48255583 IO-APIC-fasteoi eth1
18: 1348544 0 0 1 IO-APIC-fasteoi eth2
20: 9006839 8244295 6076595 4923525 IO-APIC-fasteoi ahci
21: 162750903 140985080 176469550 166839225 IO-APIC-fasteoi wcte12xp0
22: 16662710 18210608 12053147 12739782 IO-APIC-fasteoi HFC-multi
NMI: 0 0 0 0
LOC: 64546905 64546897 64546897 64546897
ERR: 0
MIS: 0
I have 3 PCI cards: 1 PRI, 1 quad BRI, 1 dual ethernet.
Could booting with "noapic" help?
What about my PCI devices? Will they be stable even with "noapic"?
The reason I got this new mobo is that the previous hardware froze the system with a kernel crash.
In fact, I rsync'ed to this new hardware (so identical system software) and it has been running flawlessly for more than a week now, while it used to crash/freeze once a day (another Asus board, by the way).
My only problem now is with the d@!mned clock...
As far as syslog messages, I don't see anything wrong. No errors whatsoever.
Thanks for your time. I'll try to boot with noapic and cross my fingers.
Vieri
More information about the asterisk-users
mailing list