[asterisk-users] hardware clock drift and CDR

Mon Apr 26 13:13:09 CDT 2010

--- On Mon, 4/26/10, Gordon Henderson <gordon+asterisk at drogon.net> wrote:

> > --- On Sun, 4/25/10, Gordon Henderson <gordon+asterisk at drogon.net>
> wrote:
> >
> >>> Hi,
> >>>
> >>> I've noticed that one of my new servers (new
> mobo) if
> >> drifting slowly
> >>> backwards in time (in aprox. 24 hours, system
> time
> >> drifts back 5
> >>> minutes).
> >>>
> >>> I have an ntpd process which is supposed to
> sync with
> >> a lan time server
> >>> but it's not quite working. So I'm launching a
> manual
> >> ntpdate or
> >>> ntp-client once an hour and that seems to
> work.
> >>
> >> If you can run ntpdate and it sets the time, then
> you are
> >> not running
> >> ntpd. The 2 can not run at the same time.
> >
> > Hi Gordon,
> >
> > Are you sure about this?
> 
> Yes.
> 
> >ntpd is a daemon and adjusts the time in a continuous
> manner. ntp-client 
> >or ntpdate or whatever are one-time clients that reset
> the system clock. 
> >I don't see why an ntp-client can't be run while ntpd
> is working (it
> >shouldn't be necessary but may come in handy when the
> time difference is 
> >big and ntpd refuses to sync).
> 
> ntp binds to the ntp port (123) and prevents anything else
> binding to it, 
> or listening on it - which ntpdate needs to do.
> 
> Example here:
> 
> Desktop is running ntpd:
> 
>    yakko:/home/gordon# ps ax | fgrep ntp
>    22064 ?       
> Ss     0:14 /usr/sbin/ntpd -p
> /var/run/ntpd.pid -u 106:107 -g
>    30340 pts/29   R+ 
>    0:00 fgrep ntp
> 
> I try to run ntpdate:
> 
>    yakko:/home/gordon# ntpdate
> essen.drogon.net
>    26 Apr 14:20:47 ntpdate[30341]: the NTP
> socket is in use, exiting
> 
> > Anyway, I've noticed that my ntpd log messages don't
> say "anything" when 
> > trying to sync to my "Windows PDC LAN time server".
> Curiously, 
> > ntp-client DOES sync to this Windows server.
> 
> > So I decided to sync to pool.ntp.org and now I see
> syslog messages that 
> > actually show that the system time gets adjusted by
> ntpd.
> >
> > I'd rather sync to my LAN time server but this is
> off-topic on this ML.
> 
> Using pool and your LAN server would be the best way
> forward - there are 
> pool server avalable for most countries too, so
> us.pool.ntp.org, 
> uk.pool.ntp.org, and so on.
> 
> Your /etc/ntp.conf file can be very simple indeed - my
> workstation one is 
> nothing more than:
> 
>    server essen.drogon.net
>    server  uk.pool.ntp.org
> 
> You can check your servers ntp daemon with:
> 
>    ntpq -c peers
> 
> and
> 
>    ntpq -c rl
> 
> The key thing to look for in the 'rl' command is 'stratum'.
> If it's 16 
> then it's not synchronised and anything less than 16 is
> good.
> 
>    yakko:/home/gordon# ntpq -c rl | fgrep
> stratum
>    processor="i686",
> system="Linux/2.6.29.2", leap=00, stratum=4,
> 
> Don't get too hung-up on how close to zero the stratum is.
> 
> >>> How does Asterisk CDR count the
> duration/billsec
> >> values? Does it rely on
> >>> system time ONLY for "call start" or also for
> "call
> >> end"?
> >>>
> >>> What Asterisk-related side-effects should I
> expect
> >> from a drifting
> >>> clock?
> >>
> >> Who cares. Just fix ntpd then your worys are
> gone.
> >
> > Well, I still have doubts about that. I could look at
> * source code but 
> > I'd rather hear from someone here.
> 
> Might be easier to read the code ;-)
> 
> > My ntp log shows this:
> >
> > 26 Apr 13:06:30 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 13:21:24 ntpd[534]: time reset +2.318647 s
> > 26 Apr 13:21:44 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 13:37:46 ntpd[534]: time reset +2.325417 s
> > 26 Apr 13:38:06 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 13:54:11 ntpd[534]: time reset +2.327974 s
> > 26 Apr 13:55:19 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 14:09:16 ntpd[534]: time reset +2.177572 s
> > 26 Apr 14:10:08 ntpd[534]: synchronized to
> xxx.xxx.xxx.xxx, stratum 2
> > 26 Apr 14:26:07 ntpd[534]: time reset +2.357017 s
> >
> > That kind of scares me because if I'm not mistaken it
> means that about 
> > every 20 seconds, my ntpd adjusts the system time by
> about 2 seconds 
> > forward. So my clock is going back 2 seconds every
> 20... That's a 
> > significant drift. And it would definitely make a
> difference in my CDR 
> > records IF Asterisk were to compare the "start and
> end" system times.
> >
> > Should I worry about this?
> 
> If ntpd can't keep the kernel time in-sync then it will
> step abput every 
> 900 seconds - which is what appears to be happening here.
> (the intervals 
> are typically much longer than 20 seconds - e.g. 13:06:30
> to 12:21:24 is 
> ~15 minutes - 900 seconds.
> 
> I don't think I've ever had a server a bad as that before,
> so have never 
> looked further... Still, it's 2 seconds in 900 seconds, not
> 2 in 20 as you 
> thought.
> 
> Which I think is odd - the Linux clock is software derived
> based on a 
> hardware interrupt - it only consults the hardware
> battery-backed clock at 
> boot time (and is supposed to write the current time to it
> at shutdown 
> time) so I wonder if your server is missing interrupts, or
> otherwise 
> mis-behaving.
> 
> Is there anything else odd in the log-files?

I ran the following and it supposedly updated my system time while ntpd was running:

# ps ax | fgrep ntp
 1256 ?        Ss     0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -u ntp:ntp
 1623 pts/14   S+     0:00 fgrep ntp

# ntpdate -b -u pool.ntp.org
26 Apr 19:41:18 ntpdate[2791]: step time server 163.117.131.239 offset 0.142263 sec

By the way, as a side question, on another server I see this:

# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 inf-srv1.hospit .LOCL.           1 u   56   64  377    0.314  21755.8   7.634

Not sure what LOCL means but I'll refer to the NTP docs (inf-srv1 is my LAN Windoze time server).

Anyway, back to the faulty new server (which reports a stratum of 3 after ntpd has been running for a while and sync'ing to pool.ntp.org):
it's supposed to be a good motherboard (Asus) but I'm running a relatively "old" kernel (2.6.23).
Googling around suggests me to try to boot with "noapic" if I keep seeing my clock drift so much.

# more /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  0:        103          0          0          1   IO-APIC-edge      timer
  1:       2151          0          0          9   IO-APIC-edge      i8042
  4:   12772543   13217932    9603064    7661766   IO-APIC-edge      serial
  8:          1          0          1          0   IO-APIC-edge      rtc
  9:          0          0          0          1   IO-APIC-fasteoi   acpi
 12:          0          0          0          4   IO-APIC-edge      i8042
 14:       2234      73664          0       2470   IO-APIC-edge      ide0
 16:   28322780   51914617   40744985   39615361   IO-APIC-fasteoi   eth0
 17:   63242610   42157366   43790794   48255583   IO-APIC-fasteoi   eth1
 18:    1348544          0          0          1   IO-APIC-fasteoi   eth2
 20:    9006839    8244295    6076595    4923525   IO-APIC-fasteoi   ahci
 21:  162750903  140985080  176469550  166839225   IO-APIC-fasteoi   wcte12xp0
 22:   16662710   18210608   12053147   12739782   IO-APIC-fasteoi   HFC-multi
NMI:          0          0          0          0
LOC:   64546905   64546897   64546897   64546897
ERR:          0
MIS:          0

I have 3 PCI cards: 1 PRI, 1 quad BRI, 1 dual ethernet.

Could booting with "noapic" help?
What about my PCI devices? Will they be stable even with "noapic"?
The reason I got this new mobo is that the previous hardware froze the system with a kernel crash.
In fact, I rsync'ed to this new hardware (so identical system software) and it has been running flawlessly for more than a week now, while it used to crash/freeze once a day (another Asus board, by the way).
My only problem now is with the d@!mned clock...

As far as syslog messages, I don't see anything wrong. No errors whatsoever.

Thanks for your time. I'll try to boot with noapic and cross my fingers.

Vieri