[asterisk-bugs] [JIRA] (ASTERISK-27463) Asterisk fails randomly

Thu Dec 7 03:54:07 CST 2017

Donat Zenichev created ASTERISK-27463:
-----------------------------------------

             Summary: Asterisk fails randomly
                 Key: ASTERISK-27463
                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27463
             Project: Asterisk
          Issue Type: Bug
      Security Level: None
    Affects Versions: 15.1.3
         Environment: PBX Core settings:
Version:                     15.1.3
Build Options:               DONT_OPTIMIZE, COMPILE_DOUBLE, BUILD_NATIVE, OPTIONAL_API
System:                      Linux/4.9.0-3-amd64 built by root on x86_64
Maximum open file handles:   65535
User name and group:         asterisk/asterisk

Machine status: virtual machine (proxmox VM, not container).
Debian 9.1

            Reporter: Donat Zenichev

Hi.

I have an issue with periodical crashes of asterisk.
There are about 500-600 registered user agents on it.
Max loading is about 30-45 calls per sec.

Asterisk is managed by pacemaker daemon.
Schema:
Master (active) <-corosync-> Slave (stand-by)
Pacemaker monitors asterisk process and when it crashes - tries to restart it.

There is no problems when I working with asterisk version 11.23.0
I moved from 11.23.0 to 13.8.3 LTS
And then I noticed that from time to time asterisk crashes, pacemaker notices that asterisk is not running and restarts it.
And so on during the day (about 1-2 crashes per hour).

But there is nothing strange in asterisk's full log with verbosity set to 4.
So that, it looks like asterisk works fine and then abruptly crashes, without any hints on the problem.

I found nothing strange in system logs of debian machine nor I found in pacemaker log, just rows that said - "asterisk is not running, ok lets restart it"
Logs are attached as url, so you can take a look.

Before asterisk crashes there are DTMF signals received.
Release note from 2017-09-13 21:31 says:
"The telephony DTMF events are not exchanged with a codec. As a result when RFC2833/RFC4733 sent digits you would crash if "core set debug 1" is enabled, the DTMF digits would always get passed to the core even though the local native RTP bridge is active, and the DTMF digits would go out using the wrong SSRC id."
So I decided to move from 13.8.3 LTS to 15.1.3.

But actually it changes nothing, it's still crashing sometimes.

The interesting thing is, that I have one more cluster running on 13.8.3 LTS with the same asterisk/host configurations and the same pacemaker version, where it works fine, without crashes.
But the difference between them is loading, the problematic asterisk is more loaded.

I added more resource to problematic asterisk (host) now is - 8CPUs 2.5Gz and 8GB ram. Resources loading is not more than 10% (cpu/ram).

Asterisk resource described as systemd service:
[Unit]
Description=Asterisk PBX and telephony daemon.
After=network.target
[Service]
Type=simple
PIDFile=/var/run/asterisk/asterisk.pid
ExecStart=/usr/sbin/asterisk -g -C /etc/asterisk/asterisk.conf
ExecStop=/usr/sbin/asterisk -rx 'core stop now'
ExecReload=/usr/sbin/asterisk -rx 'core reload'
WorkingDirectory=/var/lib/asterisk
Environment=HOME=/var/lib/asterisk
Restart=always
RestartSec=10s
TimeoutStartSec=30
TimeoutStopSec=15
LimitNOFILE=65535
LimitNPROC=65535
[Install]
WantedBy=multi-user.target

Peculiar properties in asterisk.conf:
maxfiles = 65535
runuser = asterisk
rungroup = asterisk

/etc/security/limits.conf
asterisk hard nofile 65535
asterisk soft nofile 65535
asterisk hard nproc  65535
asterisk soft nproc  65535

Asterisk starts with permissions of "asterisk" user/group.
asterisk user is owner of all working directories:
/var/lib/asterisk
/usr/lib/asterisk
/var/spool/asterisk
/var/log/asterisk
/etc/asterisk

So the main question is - why id doesn't crash with older version 11.23.0?

I have core dumps, I attached them in URL.

--
This message was sent by Atlassian JIRA
(v6.2#6252)