[asterisk-bugs] [JIRA] (ASTERISK-23719) Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'

SteelPivot (JIRA) noreply at issues.asterisk.org
Mon May 12 15:11:44 CDT 2014


    [ https://issues.asterisk.org/jira/browse/ASTERISK-23719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=218057#comment-218057 ] 

SteelPivot commented on ASTERISK-23719:
---------------------------------------

Typically, an IAX2 debug on the faulty system will show it sending out POKE messages, but not receiving PONGs. However, on the client machine, I'll see the POKEs being received and PONGs being sent. A tcpdump on the faulty system shows the PONGs coming in, though Asterisk doesn't appear to acknowledge them. This would hold true if the UDP buffer were indeed overflowed, since the incoming packets would never be processed by Asterisk. I do not believe the opposite is true, since I've always seen the buffer overflow BEFORE the IAX2 trunks go unreachable, and not the other way round.

Additionally curious is that when I have DEBUG_THREADS enabled, running the gdm command will clear the locks and allow everything to go back to normal. Without DEBUG_THREADS, running gdm operates as expected (and only grabs the backtrace while freeing up nothing).

I had hoped this was caused by the issue that was fixed recently, "prevent network thread starting before all helper threads are ready On startup, it's possible for a frame to arrive before the processing threads were ready. In iax2_process_thread() the first pass through falls into ast_cond_wait, should a frame arrive before we are at ast_cond_wait, the signal will be ignored. The result iax2_process_thread stays at ast_cond_wait forever, with deferred frames being queued. Fix: When creating initial idle iax2_process_threads, wait for init_cond to be signalled after each thread is started."  But it doesn't appear that this fix fixed my issue.

I'll grab a debug log should this occur again.

> Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'
> -------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-23719
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-23719
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_iax2
>    Affects Versions: 11.6.1
>         Environment: CentOS 6.4min
>            Reporter: SteelPivot
>            Assignee: SteelPivot
>            Severity: Critical
>         Attachments: 1399319401-core-show-locks.txt, 1399324201-backtrace-threads.txt, 1399324201-core-show-threads.txt, 1399324201-netstat.txt, 1399750801-backtrace-threads.txt, 1399750801-core-show-taskprocessors.txt, 1399750801-core-show-threads.txt, 1399750801-netstat.txt
>
>
> We've been experience an issue for a few months concerning IAX2 peers which has recently gotten more severe after upgrading from 11.2 to 11.6cert2.
> The initial symptom was all (100+) IAX2 peers going UNREACHABLE. However, after inspecting further it seems that what will happen is the UDP queues will sharply increase (seen by netstat -antup), the number of asterisk threads increases (to over 1000 threads in some cases), and Asterisk, of course, stops responding to inbound/outbound calls from any channel (SIP or IAX2).
> After recompiling with DEBUG_THREADS and BETTER_BACKTRACES, I discovered that issuing a "gdb -ex "thread apply all bt"...(etc) " to grab a backtrace will free up the UDP queues, and Asterisk will then become responsive again. Currently I have a script running each 5 minutes that pulls the UDP queues for asterisk processes, and upon seeing a queue above 300,000packets, I issue a "netstat -antup", "core show locks", "core show threads", and "gdb -ex "thread apply all bt" --batch asterisk `pidof asterisk` > $debugdir/$date-backtrace-threads.txt".
> I have previously increased the kernel UDP maximums in sysctl.conf, and added options for iaxthreadcount/iaxmaxthreadcount in iax.conf.
> I cannot repeat this issue at will, but it happens every hour or so (sometimes every few minutes). I have debug logs and backtraces for each occurrence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list