[asterisk-bugs] [JIRA] (ASTERISK-23719) Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'

SteelPivot (JIRA) noreply at issues.asterisk.org
Tue May 20 12:48:43 CDT 2014


    [ https://issues.asterisk.org/jira/browse/ASTERISK-23719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=218316#comment-218316 ] 

SteelPivot commented on ASTERISK-23719:
---------------------------------------

It's happened again this morning. Same symptoms... all IAX2 trunks drop within one minute, port 4569 UDP buffer full, and Asterisk not listening nor responding to incoming IAX2 messages.

I'll attach pcap files showing traffic on the problem host to a specific client, and the pcap traffic from that client. Here, you'll see that the problem host is sending out POKE messages AND getting PONG messages back, and likewise that the specified client is receiving POKEs and responding with PONGs (and sending its own POKEs but getting no responses). Of course, there are no ACKs anywhere.

An IAX2 debug inside Asterisk on the problem host shows only POKE messages being sent, while a debug on the client shows those POKEs received and PONG responses (while also sending POKEs of its own and receiving no response).

The debug logger messages on the problem host only shows recurring messages of "chan_iax2.c: ip callno count decremented to X for X.X.X.X", "chan_iax2.c: schedule decrement of callno used for X.X.X.X in 60 seconds", and " I was supposed to send a LAGRQ with callno XXXX, but no such call exists." These, I believe, are expected with a downed IAX2 trunk.

> Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'
> -------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-23719
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-23719
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_iax2
>    Affects Versions: 11.6.1
>         Environment: CentOS 6.4min
>            Reporter: SteelPivot
>            Assignee: SteelPivot
>            Severity: Critical
>         Attachments: 1399319401-core-show-locks.txt, 1399324201-backtrace-threads.txt, 1399324201-core-show-threads.txt, 1399324201-netstat.txt, 1399750801-backtrace-threads.txt, 1399750801-core-show-taskprocessors.txt, 1399750801-core-show-threads.txt, 1399750801-netstat.txt, 1400606323-backtrace-threads.txt, 1400606323-client-iax2-debug.txt, 1400606323-client-iax2.pcap, 1400606323-core-show-taskprocessors.txt, 1400606323-core-show-threads.txt, 1400606323-host-iax2-debug.txt, 1400606323-host-iax2.pcap, 1400606323-netstat.txt
>
>
> We've been experience an issue for a few months concerning IAX2 peers which has recently gotten more severe after upgrading from 11.2 to 11.6cert2.
> The initial symptom was all (100+) IAX2 peers going UNREACHABLE. However, after inspecting further it seems that what will happen is the UDP queues will sharply increase (seen by netstat -antup), the number of asterisk threads increases (to over 1000 threads in some cases), and Asterisk, of course, stops responding to inbound/outbound calls from any channel (SIP or IAX2).
> After recompiling with DEBUG_THREADS and BETTER_BACKTRACES, I discovered that issuing a "gdb -ex "thread apply all bt"...(etc) " to grab a backtrace will free up the UDP queues, and Asterisk will then become responsive again. Currently I have a script running each 5 minutes that pulls the UDP queues for asterisk processes, and upon seeing a queue above 300,000packets, I issue a "netstat -antup", "core show locks", "core show threads", and "gdb -ex "thread apply all bt" --batch asterisk `pidof asterisk` > $debugdir/$date-backtrace-threads.txt".
> I have previously increased the kernel UDP maximums in sysctl.conf, and added options for iaxthreadcount/iaxmaxthreadcount in iax.conf.
> I cannot repeat this issue at will, but it happens every hour or so (sometimes every few minutes). I have debug logs and backtraces for each occurrence.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list