No subject


Fri Sep 2 03:59:05 CDT 2011


d off, and with my change the phone wasn't sent the timeout to the devi=
ce.

I'm wondering why we didn't finish up with 2 MWI subscriptions, the=
re may be a clue there, did the code kill off the previous mwi subscription=
 when another for the same device arrived, or did the phone unsubscribe.
</pre>
 </blockquote>





 <p>On March 23rd, 2012, 9:44 a.m., <b>Mark Michelson</b> wrote:</p>
 <blockquote style=3D"margin-left: 1em; border-left: 2px solid #d0d0d0; pad=
ding-left: 10px;">
  <pre style=3D"white-space: pre-wrap; white-space: -moz-pre-wrap; white-sp=
ace: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">    &quot=
;Working with the senario that a phone has restarted due to loss of power f=
or some reason.
     Trusting a phone to unsubscribe to a previous subscription is just wro=
ng, it won&#39;t know what it&#39;s previous call-id was.

     The cleaner removed the power. Then plugged it back in... so many sena=
rios :)&quot;

Yes, this is why there&#39;s a timeout in place :)

Was this output taken from your patched system or from an unpatched Asteris=
k?

Looking at your change again, I missed something. It looks like old subscri=
ptions will time out in Asterisk, it&#39;s just that Asterisk won&#39;t bot=
her sending a NOTIFY to let subscribers know that they have timed out. What=
 I still don&#39;t understand though is how this change actually fixes the =
problem you originally had. I think I&#39;d need to see a SIP debug of the =
post-phone-reboot packets both before and after your patch to fully underst=
and what&#39;s up here.

Despite not having all the information in front of me, I have a bit of an i=
dea of what might be happening.

1. The phone is subscribed to Asterisk.
2. Asterisk sends some NOTIFYs to the phone. Let&#39;s say it sends 5 of th=
em. This means that the last version number Asterisk has sent to the phone =
is 4 (since version starts at 0)
3. Phone reboots. It loses all memory of the old subscription and the old v=
ersion number.
4. Phone starts a new subscription with Asterisk. Asterisk sends an initial=
 NOTIFY for this new subscription with version 0.
5. Original subscription times out. Asterisk sends a NOTIFY to end the old =
subscription. The version number in this NOTIFY is 5.
6. The phone, for some reason, even though it should have not knowledge of =
this old subscription, stores 5 as the current version number (How does the=
 phone respond to this NOTIFY? 200? 481?)
7. Asterisk sends NOTIFYs for the new subscription, incrementing version ev=
ery time. Since the phone has 5 as its current version, it will ignore all =
NOTIFYs until version 6 arrives.

If this is what&#39;s happening, then the problem isn&#39;t that the phone =
is remembering version numbers across reboots. The problem is actually that=
 the phone will honor version numbers for NOTIFYs for subscriptions it does=
 not, or at least should not, know about (which seems pretty exploitable if=
 you ask me). The solution I proposed before may work, although it might no=
t. At step 4 above, Asterisk will send an initial NOTIFY with version 0. Th=
e phone may be programmed not to accept NOTIFYs on a subscription that do n=
ot start with version 0. If that&#39;s the case, my solution will not work.=
 And if my solution does not work, then I don&#39;t know of a good one off =
the top of my head.

I still do not agree with this patch though. Most devices, including the ne=
wer generation phones you are using, properly handle a NOTIFY that ends a t=
imed-out subscription. If anything, the solution for this will have to be i=
mplemented as a new option because there is no way we will unconditionally =
never send a subscription-ending NOTIFY for dialog-info events.</pre>
 </blockquote>





 <p>On March 23rd, 2012, 3:54 p.m., <b>Alec Davis</b> wrote:</p>
 <blockquote style=3D"margin-left: 1em; border-left: 2px solid #d0d0d0; pad=
ding-left: 10px;">
  <pre style=3D"white-space: pre-wrap; white-space: -moz-pre-wrap; white-sp=
ace: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">
Yes this is unpatched, expected for the 3 DEBUG prints.

[code}
Index: channels/chan_sip.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- channels/chan_sip.c (revision 359029)
+++ channels/chan_sip.c (working copy)
@@ -3865,6 +3865,7 @@
                p-&gt;subscribed =3D NONE;
                append_history(p, &quot;Subscribestatus&quot;, &quot;timeou=
t&quot;);
                ast_debug(3, &quot;Re-scheduled destruction of SIP subscrip=
tion %s\n&quot;, p-&gt;callid ? p-&gt;callid : &quot;&lt;unknown&gt;&quot;);
+               ast_log(LOG_NOTICE, &quot;ALEC TIMEOUT of SIP subscription =
%s\n&quot;, p-&gt;callid ? p-&gt;callid : &quot;&lt;unknown&gt;&quot;);
                return 10000;   /* Reschedule this destruction so that we k=
now that it&#39;s gone */
        }

@@ -13127,10 +13130,12 @@
                                                local_display, local_target=
, local_target, mto, mto);
                        } else {
                                ast_str_append(tmp, 0, &quot;&lt;dialog id=
=3D\&quot;%s\&quot; direction=3D\&quot;recipient\&quot;&gt;\n&quot;, exten);
+                               ast_log(LOG_NOTICE, &quot;ALEC &lt;dialog i=
d=3D\&quot;%s\&quot; call-id=3D\&quot;%s\&quot; direction=3D\&quot;recipien=
t\&quot;&gt; version=3D%u %s\n&quot;, exten, p-&gt;callid, p-&gt;dialogver,=
 statestring);
                        }

                } else {
                        ast_str_append(tmp, 0, &quot;&lt;dialog id=3D\&quot=
;%s\&quot;&gt;\n&quot;, exten);
+                       ast_log(LOG_NOTICE, &quot;ALEC &lt;dialog id=3D\&qu=
ot;%s\&quot; call-id=3D\&quot;%s\&quot;&gt; version=3D%u %s\n&quot;, exten,=
 p-&gt;callid, p-&gt;dialogver, statestring);
                }
                ast_str_append(tmp, 0, &quot;&lt;state&gt;%s&lt;/state&gt;\=
n&quot;, statestring);
                if (state =3D=3D AST_EXTENSION_ONHOLD) {
{code}

Regarding steps 1-7, you have it exactly.
answer for step 6: the phone obviously isn&#39;t matching call-id.

I agree, it&#39;s only a GXP20XX issue, Grandstream have fixed it, with the=
 GXP21XX series.
I have access to a SNOM, and it works fine.

I have seen on one of the other open PBX forums, the same reported issue, t=
here fix was to remove any previous subscriptions that match peer and entit=
y. I will try to refind it. =

</pre>
 </blockquote>








</blockquote>

<pre style=3D"white-space: pre-wrap; white-space: -moz-pre-wrap; white-spac=
e: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Seems 1.2 h=
ad code to remove old duplicate subscriptions:

{code}
                        /* remove any old subscription from this peer for t=
he same exten/context,
                           as the peer has obviously forgotten about it and=
 it&#39;s wasteful to wait
                           for it to expire and send NOTIFY messages to the=
 peer only to have them
                           ignored (or generate errors)
                        */
                        ast_mutex_lock(&amp;iflock);
                        for (p_old =3D iflist; p_old; p_old =3D p_old-&gt;n=
ext) {
                                if (p_old =3D=3D p)
                                        continue;
                                if (p_old-&gt;initreq.method !=3D SIP_SUBSC=
RIBE)
                                        continue;
                                if (p_old-&gt;subscribed =3D=3D NONE)
                                        continue;
                                ast_mutex_lock(&amp;p_old-&gt;lock);
                                if (!strcmp(p_old-&gt;username, p-&gt;usern=
ame)) {
                                        if (!strcmp(p_old-&gt;exten, p-&gt;=
exten) &amp;&amp;
                                            !strcmp(p_old-&gt;context, p-&g=
t;context)) {
                                                ast_set_flag(p_old, SIP_NEE=
DDESTROY);
                                                ast_mutex_unlock(&amp;p_old=
-&gt;lock);
                                                break;
                                        }
                                }
                                ast_mutex_unlock(&amp;p_old-&gt;lock);
                        }
                        ast_mutex_unlock(&amp;iflock);

{code}</pre>
<br />








<p>- Alec</p>


<br />
<p>On March 13th, 2012, 5:35 p.m., Alec Davis wrote:</p>






<table bgcolor=3D"#fefadf" width=3D"100%" cellspacing=3D"0" cellpadding=3D"=
8" style=3D"background-image: url('https://reviewboard.asterisk.org/media/r=
b/images/review_request_box_top_bg.png'); background-position: left top; ba=
ckground-repeat: repeat-x; border: 1px black solid;">
 <tr>
  <td>

<div>Review request for Asterisk Developers.</div>
<div>By Alec Davis.</div>


<p style=3D"color: grey;"><i>Updated March 13, 2012, 5:35 p.m.</i></p>




<h1 style=3D"color: #575012; font-size: 10pt; margin-top: 1.5em;">Descripti=
on </h1>
<table width=3D"100%" bgcolor=3D"#ffffff" cellspacing=3D"0" cellpadding=3D"=
10" style=3D"border: 1px solid #b8b5a0">
 <tr>
  <td>
   <pre style=3D"margin: 0; padding: 0; white-space: pre-wrap; white-space:=
 -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap=
: break-word;">Some phones (Grandstream GXP20XX series 1.2.5.3 firmware) if=
 rebooted, loose the BLF lights due to the new subscription&#39;s &quot;ver=
sion&quot; being less than the previous subscription&#39;s &quot;version&qu=
ot;.
This happens the instant that asterisk timeouts the previous subscription -=
 which has the much larger &quot;version&quot; sequence number.

The only way for the lights to start working again, is for the current subs=
cription&#39;s &quot;version&quot; number to increment past the previous ol=
d &quot;version&quot; number.
That period of time could be huge, if the phone isn&#39;t rebooted for days.

The workaround has been to reboot the phone twice, within a few minutes of =
each reboot.
  1st time wait for the BLF&#39;s to fail (then reboot again) - which is th=
e timeout period - a few minutes.
  2nd time the lights will work, then fail, then after a few minutes start =
working again.

This fix prevents the old subscription timeout from updating the phones &qu=
ot;version&quot; number. =


This issue doesn&#39;t affter the GXP21XX series.

Attempt to show issue byway of example below with some debug output showing=
 call-id and version number.

astrid-test*CLI&gt; core show hints

    -=3D Registered Asterisk Dial Plan Hints =3D-
            8601GXP0001 at trusted             : Custom:q8601_a8612    State:I=
nUse           Watchers  1
Debug output:

=3D=3D=3DSet the BLF state to InUse
- Executing [s at macro-custom-agent-inout:13] Set(&quot;SIP/GXP0001-00000005&=
quot;, &quot;DEVICE_STATE(Custom:q8601_a8612)=3DINUSE&quot;) in new stack
...
[Mar 14 11:01:39] NOTICE[10819]: chan_sip.c:13136 state_notify_build_xml: A=
LEC &lt;dialog id=3D&quot;8601GXP0001&quot; call-id=3D&quot;3c65aed02ae1d5d=
f at 192.168.5.141&quot;&gt; version=3D5

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Reboot the phone, it gets a new BLF subscription.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
[Mar 14 11:02:30] NOTICE[10826]: chan_sip.c:13136 state_notify_build_xml: A=
LEC &lt;dialog id=3D&quot;8601GXP0001&quot; call-id=3D&quot;79956c016812540=
0 at 192.168.5.141&quot;&gt; version=3D0
=3D=3D=3DThe BLF has correct state of InUse.

...
[Mar 14 11:05:00] NOTICE[10826]: chan_sip.c:13136 state_notify_build_xml: A=
LEC &lt;dialog id=3D&quot;8601GXP0001&quot; call-id=3D&quot;79956c016812540=
0 at 192.168.5.141&quot;&gt; version=3D1
=3D=3D=3DThe BLF still has correct state of InUse.

...
The timeout &#39;last notification&#39;
[Mar 14 11:05:09] NOTICE[10826]: chan_sip.c:13131 state_notify_build_xml: A=
LEC &lt;dialog id=3D&quot;8601GXP0001&quot; call-id=3D&quot;3c65aed02ae1d5d=
f at 192.168.5.141&quot; direction=3D&quot;recipient&quot;&gt; version=3D6
[Mar 14 11:05:09] NOTICE[10826]: chan_sip.c:3868 __sip_autodestruct: ALEC T=
IMEOUT of SIP subscription 3c65aed02ae1d5df at 192.168.5.141
=3D=3D=3DThe BLF has incorrectly gone Idle.

...
[Mar 14 11:15:01] NOTICE[10826]: chan_sip.c:13136 state_notify_build_xml: A=
LEC &lt;dialog id=3D&quot;8601GXP0001&quot; call-id=3D&quot;79956c016812540=
0 at 192.168.5.141&quot;&gt; version=3D5
=3D=3D=3DThe BLF is still incorrect at Idle.

    -- Registered SIP &#39;GXP0001&#39; at 192.168.5.141:5066
[Mar 14 11:17:31] NOTICE[10826]: chan_sip.c:13136 state_notify_build_xml: A=
LEC &lt;dialog id=3D&quot;8601GXP0001&quot; call-id=3D&quot;79956c016812540=
0 at 192.168.5.141&quot;&gt; version=3D6
=3D=3D=3DThe BLF now has correct state of InUse.
</pre>
  </td>
 </tr>
</table>


<h1 style=3D"color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing <=
/h1>
<table width=3D"100%" bgcolor=3D"#ffffff" cellspacing=3D"0" cellpadding=3D"=
10" style=3D"border: 1px solid #b8b5a0">
 <tr>
  <td>
   <pre style=3D"margin: 0; padding: 0; white-space: pre-wrap; white-space:=
 -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap=
: break-word;">In place on a 1.8 deployment for months, our production box =
with ~80 GXP phones, half of which are the older GXP20XX series.
</pre>
  </td>
 </tr>
</table>




<h1 style=3D"color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b>=
 </h1>
<ul style=3D"margin-left: 3em; padding-left: 0;">

 <li>trunk/channels/chan_sip.c <span style=3D"color: grey">(358608)</span><=
/li>

</ul>

<p><a href=3D"https://reviewboard.asterisk.org/r/1813/diff/" style=3D"margi=
n-left: 3em;">View Diff</a></p>




  </td>
 </tr>
</table>








  </div>
 </body>
</html>


--===============3074523449968740501==--



More information about the asterisk-dev mailing list