[asterisk-bugs] [JIRA] (ASTERISK-25911) chan_iax2: IAX Max Retries occasionally
Jeppe Ryskov Larsen (JIRA)
noreply at issues.asterisk.org
Tue May 31 04:34:56 CDT 2016
[ https://issues.asterisk.org/jira/browse/ASTERISK-25911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=230845#comment-230845 ]
Jeppe Ryskov Larsen edited comment on ASTERISK-25911 at 5/31/16 4:34 AM:
-------------------------------------------------------------------------
After inspecting the logs from the previous time it happended, i can see that the last executed dialplan application, before it hangs and prints 'max retries...', is {{MSet(LOCAL(...))}}.
Here are a few examples from my logs:
23/05/2016
{code}
[2016-05-23 13:23:04] VERBOSE[44062][C-0000006a] pbx.c: Executing [~~s~~@enterQueue:1] MSet("IAX2/odn1-voip-cluster02-upstream01-7669", "LOCAL(queueid)=80") in new stack
{code}
25/05/2016
{code}
[2016-05-25 10:15:07] VERBOSE[15591][C-00000010] pbx.c: Executing [~~s~~@parkCall:1] MSet("IAX2/odn1-voip-cluster02-upstream01-12266", "LOCAL(parkingspace)=1000") in new stack
{code}
31/05/2016
{code}
[2016-05-31 11:12:57] VERBOSE[21572][C-00000304] pbx.c: Executing [~~s~~@parkCall:1] MSet("IAX2/odn1-voip-cluster02-upstream01-7119", "LOCAL(parkingspace)=1000") in new stack
{code}
That seems pretty interesting.
was (Author: ryskov):
After inspecting the logs from the previous time it happended, i can see that the last executed dialplan application, before it hangs and prints 'max retires...', is {{MSet(LOCAL(...))}}.
Here are a few examples from my logs:
23/05/2016
{code}
[2016-05-23 13:23:04] VERBOSE[44062][C-0000006a] pbx.c: Executing [~~s~~@enterQueue:1] MSet("IAX2/odn1-voip-cluster02-upstream01-7669", "LOCAL(queueid)=80") in new stack
{code}
25/05/2016
{code}
[2016-05-25 10:15:07] VERBOSE[15591][C-00000010] pbx.c: Executing [~~s~~@parkCall:1] MSet("IAX2/odn1-voip-cluster02-upstream01-12266", "LOCAL(parkingspace)=1000") in new stack
{code}
31/05/2016
{code}
[2016-05-31 11:12:57] VERBOSE[21572][C-00000304] pbx.c: Executing [~~s~~@parkCall:1] MSet("IAX2/odn1-voip-cluster02-upstream01-7119", "LOCAL(parkingspace)=1000") in new stack
{code}
That seems pretty interesting.
> chan_iax2: IAX Max Retries occasionally
> ---------------------------------------
>
> Key: ASTERISK-25911
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-25911
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Channels/chan_iax2
> Affects Versions: 13.7.2, 13.9.0, 13.9.1
> Environment: Ubuntu server
> Reporter: Andreas Krüger
> Assignee: Unassigned
> Attachments: backtrace-threads.txt, core-show-channels-infos.txt, core-show-channels.txt, debug_log_25911_odn1-voip-cluster02-asterisk01, debug_log_25911_odn1-voip-cluster02-upstream01, iax2-show-channels.txt, iax2-show-netstats.txt, iax.conf, upload (1).png
>
>
> Hi there,
> We ran into a problem, when there is some, but not high, load on some of our asterisk servers, we suddenly see an IAX max retries error in the console.
> When this happens, everything stops to work and we cannot get asterisk to work again unless we restart the service (not the server).
> I tried to start asterisk trough GDB, but since asterisk never crashes, there is nothing to show in gdb about the problem.
> I've also sat up a monitoring tool to check for network glitches and neither this has happened.
> I've also tried to increase the max retries in chan_iax2.c and recompile asterisk, as I've read on some forums that it should resolve the issue, but this is neither the case.
> {code}
> sed -i "s/static int max_retries = 4;/static int max_retries = 12;/" channels/chan_iax2.c
> {code}
> I've attached the output from the console we see. This messages just keeps popping up and seems not to end. This could for me look like theres some cleanup not working in chan_iax2.c when the max retries happens. The error we're facing happens on this line:
> https://github.com/asterisk/asterisk/blob/13.7/channels/chan_iax2.c#L3572
> I could use some advice to debug this problem further and resolve it, because when this error happens, Asterisk does not work at all until it's get restarted.
> The problem is not persistent and I have a hard time to reproduce it. But we see it when the load increases. Doing 10k calls within 7 hour seems to make it happen.
> I looked into the code, and see that it uses a reference to a callno, which for me looks like a counter that increases ? - Could we maybe see some sort of race condition or maybe the callno runs out of scope?
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list