[asterisk-bugs] [JIRA] (ASTERISK-20462) [patch] Trunk not hungup if SLA Station hangs up before answer

dkerr (JIRA) noreply at issues.asterisk.org
Mon Nov 19 20:07:45 CST 2012


    [ https://issues.asterisk.org/jira/browse/ASTERISK-20462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=199907#comment-199907 ] 

dkerr commented on ASTERISK-20462:
----------------------------------

Okay, so I'm a bit out of my depth here not being thoroughly familiar with the asterisk architecture. But I'm trying to think through the possibilities here.  The dial_trunk() function is a thread kicked off by the SLAStation() function which is used on outbound calls only. SLAStation() waits on a semaphore until dial_trunk() signals it is done, so no risk from the thread that SLAStation executes on.  In the case of outbound calls the station is not "ringing" so I don't think the number of stations ringing count comes into play. Also only one trunk can be rung... ast_dial() in this case only calls a single destination.

For reload to occur both the trunk and the station would have to have hung up.  If that occurs then there is a possibility that args->station points to a invalid structure.  args->station itself would not have been NULLed out though because SLAstation() is blocked on the semaphore and no other thread should know about it, so a test for a NULL args->station won't help.

Now here is where the 100ms second sleep creates a problem.  If both the trunk and the station hangup simultaneously, while dial_trunk() is sleeping, then there is a window during which a delayed reload could occur, such that the test for originating station not-in-use could end up pointing to a args->station->name that is now bogus. But I think that this should be handled okay by ast_device_state() which should return a AST_DEVICE_INVALID or AST_DEVICE_UNKNOWN.  Maybe we need to test for those return codes too? Problem may be alleviated by placing the 100ms sleep after the test for in use rather than before for then ast_dial_answered() will be called first and should return a hangup, which breaks out of the loop. 


I couldn't agree more that this should have been written using callbacks. The inbound SLATrunk() function that generates rings to multiple extension "stations" was written with callbacks. For some reason SLAStation() was not -- and I believe that this is the root cause for ringback and station hangup not being handled in the original design... it was simply overlooked.

Tight polling loops are a hack, I agree.  I added the 100ms sleep because I attempted to add debug statements inside the loop -- bad idea, thousands were generated to the console causing asterisk to quickly crash.  I'm guessing because a buffer somewhere got overrun. Limiting the loop to ten a second controlled this.  I have no way of measuring CPU use, but in my environment it would not be a problem as I'm only running a handful of extensions and rarely more than two simultaneous calls. How it would affect a larger installation, which probably use multi-core CPUs, I have no idea.

SLA does not seem to be a widely used feature of Asterisk (else this problem would have been logged long ago) and my objective is to merely fix what is broken within the bounds of current design.  It is a much larger project to redesign this "properly" maybe a task to be undertaken when SLA is ported from meetme to the new conference service.

David
                
> [patch] Trunk not hungup if SLA Station hangs up before answer
> --------------------------------------------------------------
>
>                 Key: ASTERISK-20462
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-20462
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_meetme, Applications/SLA
>    Affects Versions: 1.8.15.0
>         Environment: Astlinux
>            Reporter: dkerr
>         Attachments: asterisk-trunk-bugid20462v2.patch, extensions.conf, Hangup before answer.txt, sip.conf, sla.conf
>
>
> If an SLA station hangs up before the called party answers, then the channel to the sla station is terminated, but the channel to the called party remains open and continues to ring until timeout. Or if the party answers then they get unobtainable tone and asterisk fails all over trying to connect to a meetme conference that does not exist.
> See attached log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list