[asterisk-bugs] [JIRA] (ASTERISK-23745) Call file problem, DelayedRetry/retrying spite MaxRetries: 0

Mon May 19 10:52:43 CDT 2014

    [ https://issues.asterisk.org/jira/browse/ASTERISK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=218267#comment-218267 ] 

Mikael Fredin commented on ASTERISK-23745:
------------------------------------------

I think it's safe to conclude that this is not a call file problem but a problem with channels not hanging up. Some output during error:

[root at s tests]# asterisk -rx "core show channels"
Channel              Location             State   Application(Data)
SIP/peer-00002e24 s at macro-call-transfe Up      Dial(SIP/provider/00123456,
SIP/peer-00002e28  (None)               Up      AppDial((Outgoing Line))
2 active channels
158 active calls

While "sip show channels" shows:
<output with lots of ACK messages, and two BYE>
44 active SIP dialogs

I can see in the log that there's two calls who are stuck in a long running Hangup-channel, as there's many of these messages:
[May 14 15:12:25] WARNING[20406] chan_sip.c: Autodestruct on dialog '3d0d0e305260dc1240bf974510ed50b7 at 123.123.123.123' with owner in place (Method: BYE). Rescheduling destruction for 10000 ms
[May 14 15:12:28] WARNING[20406] chan_sip.c: Autodestruct on dialog 'dEZoUSApp_IJ0-ZBbG-ytw..' with owner in place (Method: BYE). Rescheduling destruction for 10000 ms

These correspond with the two BYE's in the "sip show channel" output (and with the only channels showing in "core show calls"). No idea if that can cause a massive problem like this.

During time of failure I would expect to have 24 calls running (24 call files), but the call files had already completed and hanged up their calls - without channels getting destroyed (meaning the call files never get removed). And with "active calls" reaching 158 for some reason, while we only had a few incoming calls active.

I'm using NAT with:

externaddr = 212.x.x.x in sip.conf and 
nat=yes, qualify=yes for the peers

In case it may be related to NAT.

I should know sooner or later if it's related, since I have made a new setup without NAT.

> Call file problem, DelayedRetry/retrying spite MaxRetries: 0
> ------------------------------------------------------------
>
>                 Key: ASTERISK-23745
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-23745
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/PBX, PBX/pbx_spool
>    Affects Versions: 1.8.16.0, 1.8.27.0
>         Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
> HP ProLiant DL360 G7, 24 GB RAM, 24 cores
>            Reporter: Mikael Fredin
>
> Following problem arises from time to time, a call will successfully terminate:
> [May 14 14:31:41] VERBOSE[3274] pbx_realtime.c:     -- Executing [t at project_init:1] Hangup("SIP/peer-2-00002f7e", "")
> [May 14 14:31:41] VERBOSE[3274] pbx.c:   == Spawn extension (project_init, t, 1) exited non-zero on 'SIP/peer-2-00002f7e'
> <bye message,  Really destroying SIP dialog, etc>
> This is the call file:
> Channel: SIP/peer-2/00numberhere
> CallerID: "" <+calleridhere>
> Extension: 123
> SetVar: someid=123
> Context: setup
> WaitTime: 30
> MaxRetries: 0
> RetryTime: 300
> Account: 123
> Priority: 1
> Some time after the call has hung up, the call file is still there and this is appended to the file:
> StartRetry: 20354 1 (1400070906) (My note: Wed May 14 14:35:06 CEST 2014)
> DelayedRetry: 20354 0 (1400070906) same time...
> DelayedRetry: 20354 0 (1400071206) five minutes...
> DelayedRetry: 20354 0 (1400071506) and so on...
> DelayedRetry: 20354 0 (1400071806) never deleting this file
> DelayedRetry: 20354 0 (1400072106) are we?
> DelayedRetry: 20354 0 (1400072406) nope....
> DelayedRetry: 20354 0 (1400072706) waiting for someone....
> DelayedRetry: 20354 0 (1400073006) to do manual work
> Asterisk log:
> [May 14 14:35:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> [May 14 14:40:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> [May 14 14:45:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> [May 14 14:50:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> [May 14 14:55:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> [May 14 15:00:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> [May 14 15:05:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> [May 14 15:10:06] DEBUG[20421] pbx_spool.c: Delaying retry since we're currently running '/var/spool/asterisk/outgoing/callfile'
> It happens only once in a while, my guess it happens when many call files are generated at once (I do use unique names for them, based on an auto_increment in a database). This will happen to a lot of call files, also, so it's not just one call file (seen 40+ get stuck like this).
> core show channels will report far too many "active calls", but very few channels compared to "active calls" (and "core show calls" will show the same, too high, number).
> So somehow, sometimes, something stops registering that calls are hanging up and that a call file should be removed.
> Been thinking of skipping the check:
> if (o->retries <= o->maxretries) {
> On row 446 in pbx/pbx.c just to get the calls flowing, because I never want a retry on a call file (it would just skip to delete the file and report "expired without completion" instead). In fact, I have specified "MaxRetries: 0", so IMO I shouldn't get to the "Delaying retry" part anyways.
> But not sure if that would be a good idea - I don't know the inner workings of asterisk and maybe things would start burning anyways. Especially considering something goes wrong with the active call count.

--
This message was sent by Atlassian JIRA
(v6.2#6252)