[asterisk-bugs] [JIRA] (ASTERISK-23719) Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'
SteelPivot (JIRA)
noreply at issues.asterisk.org
Mon May 5 16:52:43 CDT 2014
[ https://issues.asterisk.org/jira/browse/ASTERISK-23719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
SteelPivot updated ASTERISK-23719:
----------------------------------
Description:
We've been experience an issue for a few months concerning IAX2 peers which has recently gotten more severe after upgrading from 11.2 to 11.6cert2.
The initial symptom was all (100+) IAX2 peers going UNREACHABLE. However, after inspecting further it seems that what will happen is the UDP queues will sharply increase (seen by netstat -antup), the number of asterisk threads increases (to over 1000 threads in some cases), and Asterisk, of course, stops responding to inbound/outbound calls from any channel (SIP or IAX2).
After recompiling with DEBUG_THREADS and BETTER_BACKTRACES, I discovered that issuing a "gdb -ex "thread apply all bt"...(etc) " to grab a backtrace will free up the UDP queues, and Asterisk will then become responsive again. Currently I have a script running each 5 minutes that pulls the UDP queues for asterisk processes, and upon seeing a queue above 300,000packets, I issue a "netstat -antup", "core show locks", "core show threads", and "gdb -ex "thread apply all bt" --batch asterisk `pidof asterisk` > $debugdir/$date-backtrace-threads.txt".
I have previously increased the kernel UDP maximums in sysctl.conf, and added options for iaxthreadcount/iaxmaxthreadcount in iax.conf.
I cannot repeat this issue at will, but it happens every hour or so (sometimes every few minutes). I have debug logs and backtraces for each occurrence.
was:
We've been experience an issue for a few months concerning IAX2 peers which has recently gotten more severe after upgrading from 11.2 to 11.6cert2.
The initial symptom was all (100+) IAX2 peers going UNREACHABLE. However, after inspecting further it seems that what will happen is the UDP queues will sharply increase (seen by netstat -antup), the number of asterisk threads increases (to over 1000 threads in some cases), and Asterisk, of course, stops responding to inbound/outbound calls from any channel (SIP or IAX2).
After recompiling with DEBUG_THREADS and BETTER_BACKTRACES, I discovered that issuing a "gdb -ex "thread apply all bt"...(etc) " to grab a backtrace will free up the UDP queues, and Asterisk will then become responsive again. Currently I have a script running each 5 minutes that pulls the UDP queues for asterisk processes, and upon seeing a queue above 300,000packets, I issue a "netstat -antup", "core show locks", "core show threads", and "gdb -ex "thread apply all bt" --batch asterisk `pidof asterisk` > $debugdir/$date-backtrace-threads.txt".
> Asterisk locks, UDP buffer overflow, 1000+ spawns of 'chan_iax2.c find_idle_thread()'
> -------------------------------------------------------------------------------------
>
> Key: ASTERISK-23719
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-23719
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Channels/chan_iax2
> Affects Versions: 11.6.1
> Environment: CentOS 6.4min
> Reporter: SteelPivot
> Severity: Critical
> Attachments: 1399319401-core-show-locks.txt, 1399324201-backtrace-threads.txt, 1399324201-core-show-threads.txt, 1399324201-netstat.txt
>
>
> We've been experience an issue for a few months concerning IAX2 peers which has recently gotten more severe after upgrading from 11.2 to 11.6cert2.
> The initial symptom was all (100+) IAX2 peers going UNREACHABLE. However, after inspecting further it seems that what will happen is the UDP queues will sharply increase (seen by netstat -antup), the number of asterisk threads increases (to over 1000 threads in some cases), and Asterisk, of course, stops responding to inbound/outbound calls from any channel (SIP or IAX2).
> After recompiling with DEBUG_THREADS and BETTER_BACKTRACES, I discovered that issuing a "gdb -ex "thread apply all bt"...(etc) " to grab a backtrace will free up the UDP queues, and Asterisk will then become responsive again. Currently I have a script running each 5 minutes that pulls the UDP queues for asterisk processes, and upon seeing a queue above 300,000packets, I issue a "netstat -antup", "core show locks", "core show threads", and "gdb -ex "thread apply all bt" --batch asterisk `pidof asterisk` > $debugdir/$date-backtrace-threads.txt".
> I have previously increased the kernel UDP maximums in sysctl.conf, and added options for iaxthreadcount/iaxmaxthreadcount in iax.conf.
> I cannot repeat this issue at will, but it happens every hour or so (sometimes every few minutes). I have debug logs and backtraces for each occurrence.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list