[asterisk-bugs] [JIRA] (ASTERISK-23719) Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'

Mon May 12 14:42:44 CDT 2014

    [ https://issues.asterisk.org/jira/browse/ASTERISK-23719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=218056#comment-218056 ] 

Matt Jordan commented on ASTERISK-23719:
----------------------------------------

The backtrace of threads from the faulty system doesn't appear to show any odd state, nor does the taskprocessor dump. I'm curious what an IAX debug would show when this occurs. Right now, this doesn't look so much like a deadlock as it does some odd behaviour in {{chan_iax2}}.

> Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'
> -------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-23719
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-23719
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_iax2
>    Affects Versions: 11.6.1
>         Environment: CentOS 6.4min
>            Reporter: SteelPivot
>            Assignee: SteelPivot
>            Severity: Critical
>         Attachments: 1399319401-core-show-locks.txt, 1399324201-backtrace-threads.txt, 1399324201-core-show-threads.txt, 1399324201-netstat.txt, 1399750801-backtrace-threads.txt, 1399750801-core-show-taskprocessors.txt, 1399750801-core-show-threads.txt, 1399750801-netstat.txt
>
>
> We've been experience an issue for a few months concerning IAX2 peers which has recently gotten more severe after upgrading from 11.2 to 11.6cert2.
> The initial symptom was all (100+) IAX2 peers going UNREACHABLE. However, after inspecting further it seems that what will happen is the UDP queues will sharply increase (seen by netstat -antup), the number of asterisk threads increases (to over 1000 threads in some cases), and Asterisk, of course, stops responding to inbound/outbound calls from any channel (SIP or IAX2).
> After recompiling with DEBUG_THREADS and BETTER_BACKTRACES, I discovered that issuing a "gdb -ex "thread apply all bt"...(etc) " to grab a backtrace will free up the UDP queues, and Asterisk will then become responsive again. Currently I have a script running each 5 minutes that pulls the UDP queues for asterisk processes, and upon seeing a queue above 300,000packets, I issue a "netstat -antup", "core show locks", "core show threads", and "gdb -ex "thread apply all bt" --batch asterisk `pidof asterisk` > $debugdir/$date-backtrace-threads.txt".
> I have previously increased the kernel UDP maximums in sysctl.conf, and added options for iaxthreadcount/iaxmaxthreadcount in iax.conf.
> I cannot repeat this issue at will, but it happens every hour or so (sometimes every few minutes). I have debug logs and backtraces for each occurrence.

--
This message was sent by Atlassian JIRA
(v6.2#6252)