[asterisk-bugs] [JIRA] (ASTERISK-26601) res_pjsip: task_processors in queue pjsip stop working

Carl Fortin (JIRA) noreply at issues.asterisk.org
Thu Nov 17 11:57:10 CST 2016


    [ https://issues.asterisk.org/jira/browse/ASTERISK-26601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=233815#comment-233815 ] 

Carl Fortin edited comment on ASTERISK-26601 at 11/17/16 11:57 AM:
-------------------------------------------------------------------

Hi Joshua,

Obviously, we will not be able to find this bug with our test environment since we need a certain amount of active calls
combined with a hard to catch event. Our production system is compiled with DONT_OPTIMIZE and BETTER_BACKTRACES but not with DEBUG_THREADS.
I will have to recompile and re enable it.
 
It could take 3 or 4 weeks for the deadlock to occur, maybe more.
I made a script for extracting the backtrace before restarting asterisk when taskprocessors starts to pile up and everything falls apart:

Take a look at the script a did.


Could something like this work?
Any recommendations will be appreciated.


was (Author: phonefxg):
Hi Joshua,

Obviously, we will not be able to find this bug with our test environment since we need a certain amount of active calls
combined with a hard to catch event. Our production system is compiled with DONT_OPTIMIZE and BETTER_BACKTRACES but not with DEBUG_THREADS.
I will have to recompile and re enable it.
 
It could take 3 or 4 weeks for the deadlock to occur, maybe more.
I made a script for extracting the backtrace before restarting asterisk when taskprocessors starts to pile up and everything falls apart:

This command will be executed every 15 seconds to detect if taskprocessors are in queue.
The highest number is sent to Zabbix (our monitoring solution) and if there is more than 10 taskprocessors it will
fire the bash script below to get the traces without any intervention.

sudo -u root asterisk -x ' core show taskprocessors' | awk '$3>=0{print $3}' OFS='\t' |awk '{if(NR>1)print}' | grep -Eo '[0-9]+' | sort -rn | head -n 1 |  tr -d '\n'



#!/bin/bash
# Find Asterisk'S PID
Asterisk_PID=$(ps -w | grep "\<asterisk\>" | awk '{print $1}')
# Extract active locks from asterisk before executing gdb
asterisk -rx "core show locks" > /var/log/asterisk/core-show-locks.txt
# Execute gdb on asterisk PID to extract Backtraces
gdb -ex "thread apply all bt" --batch /usr/sbin/asterisk $Asterisk_PID > /var/log/asterisk/backtrace-threads.txt
# Extract process in queue
asterisk -x ' core show taskprocessors' > /var/log/asterisk/Task_processors_list.txt
sleep 5
# Restart asterisk service because we have a deadlock in a production system
service asterisk restart
echo 'script executed!'


Could something like this work?
Any recommendations will be appreciated.

> res_pjsip: task_processors in queue pjsip stop working
> ------------------------------------------------------
>
>                 Key: ASTERISK-26601
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26601
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>    Affects Versions: 14.1.1
>         Environment: Asterisk Realtime 14.1.0 rc1
>  PJSIP Driver
> mysql Ver 14.14
> pjproject 2.5.5
> spandsp 0.0.6
> jansson 2.7
> CentOS 6.6 64 bits on Vmware
> Number of phones : 700
> Average concurrent calls: 16
>            Reporter: Carl Fortin
>            Assignee: Unassigned
>         Attachments: full, task_procesor.txt
>
>
> I had Asterisk 14.1.0-rc1  running for 3 weeks, and all of the sudden PJSIP stopped working. Nothing in the console.
> I had time to save the task_processor output before restarting asterisk.
> After doing a restart to get the system back on I can see this in the log files:
> taskprocessor.c: The 'app_voicemail' task processor queue reached 500 scheduled tasks.
> I did not find any message concerning taskprocessor before the system stopped functioning.
> I'm aware that I am running an RC release, but looking at the release note, there were nothing concerning deadlock so I was thinking updating later.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list