[asterisk-bugs] [JIRA] (ASTERISK-21677) NOTIFYs for BLF start queuing up and fail to be sent out

Sun May 5 02:37:38 CDT 2013

    [ https://issues.asterisk.org/jira/browse/ASTERISK-21677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=206126#comment-206126 ] 

Alec Davis edited comment on ASTERISK-21677 at 5/5/13 2:37 AM:
---------------------------------------------------------------

I think we agree, this is the correct way to fix a network issue, where the response doesn't come back, or request didn't make it to device thus a response was never going to come back.

Dan/David you say 'since upgrading to 11':

  This network pacing control (interlock) where the device is required to respond has been there since asterisk 1.4.
  On a 1.8 system without the patch, simulating the fault (with the same steps to reproduce as noted earlier) should also prevent further BLF updates, waiting until the next re-subscribe **WONT** fix the BLF updates. Only by restarting the phone will a new subscription be obtained, and BLF will be correct.

  On a 1.8 system with the patch, after simulated fault, the phone will re-subscribe (asterisk will create new subscription), and the BLF will be correct.

  There is a change in 11, where the device state notify is now sent out immediately after each re-subscribe.
  Refer https://reviewboard.asterisk.org/r/2048 comment in code {code}"/* RFC 3265: A notification must be sent on every subscribe, so force it */"{code}

  Prior to 11, the device state notify would only be sent on the initial subscribe, re-subscribes didn't cause a state notify.
  {code}transmit_state_notify(p, &data, 1, FALSE);	/* Send first notification */{code}

I have a few questions:
  1). What models of phones are these?
  2). On a non patched system are you able to provide a sip debug trace of one failing (but not by disconnecting the network).

      was (Author: alecdavis):
    I think we agree, this is the correct way to fix a network issue, where the response doesn't come back, or request didn't make it to device thus a response was never going to come back.

Dan/David you say 'since upgrading to 11':

  This network pacing control (interlock) where the device is required to respond has been there since asterisk 1.4.
  On a 1.8 system without the patch, simulating the fault (with the same steps to reproduce as noted earlier) should also prevent further BLF updates, waiting until the next re-subscribe **WONT** fix the BLF updates. Only by restarting the phone will a new subscription be obtained, and BLF will be correct.

  On a 1.8 system with the patch, after simulated fault, the phone will re-subscribe (asterisk will create new subscription), and the BLF will be correct.

  There is a change in 11, where the device state notify is now sent out immediately after each re-subscribe.
  Prior to 11, the device state notify would only be sent on the initial subscribe, re-subscribes didn't cause a state notify. refer https://reviewboard.asterisk.org/r/2048 comment in code {{"/* RFC 3265: A notification must be sent on every subscribe, so force it */"}}

I have a few questions:
  1). What models of phones are these?
  2). On a non patched system are you able to provide a sip debug trace of one failing (but not by disconnecting the network).

> NOTIFYs for BLF start queuing up and fail to be sent out
> --------------------------------------------------------
>
>                 Key: ASTERISK-21677
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21677
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/General
>    Affects Versions: 11.3.0
>         Environment: Centos 5.6, Quad Core Intel Xeon @ 3.0GHZ, 4GB RAM
>            Reporter: Dan Martens
>         Attachments: review2475.diff2.txt
>
>
> Hello, 
> We have noticed that since we upgraded to Asterisk 11, the BLF lamps on phones (multiple makes and models) stop working from time to time.  To get them to work again, we have to bring the device offline and back online again.
> When this happens, we start to see a lot of "queued" messages in the logs regarding the extension that is not working.  For example:
> Extension Changed 100[witgoffice-local] new state Ringing for Notify User witg_116 (queued)
> Once a device is listed as "queued", it will never be dequeued unless you make it go offline.  It gets stuck in this state.
> SIP network traces show that once the device goes into queued state, Asterisk will no longer send any NOTIFY messages.  It only sends them when it is in non-queued state.  
> A brief look at the code, shows that the flag which gets reset to allow these notifications to get through only gets reset in a single branch of code:
> chan_sip.c at 22939 in handle_response_notify 
> ast_clear_flag(&p->flags[1], SIP_PAGE2_STATECHANGEQUEUE);
> This will only occur when a device sends back a 200 OK request to a previous NOTIFY message.  If this response never comes back (ie. packet loss etc.), then the flag gets stuck in this state forever.
> I would propose that a fix to this would be to reset the flag:
> ast_clear_flag(&p->flags[1], SIP_PAGE2_STATECHANGEQUEUE);
> during either a SUBSCRIBE request or REGISTER request.  That way, if the flag is stuck, it will get reset in a short amount of time when the device performs its next registration routine.  That is, unless I am completely wrong or there is a better way of doing things.
> Your help is greatly appreciated.
> Thanks,

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira