[asterisk-bugs] [JIRA] (ASTERISK-21194) chan_sip can fail to find a peer during reload

Jaco Kroon (JIRA) noreply at issues.asterisk.org
Fri Mar 1 11:01:19 CST 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=203599#comment-203599 ] 

Jaco Kroon commented on ASTERISK-21194:
---------------------------------------

Michael,

According that definition the particular peer in question isn't active since the remote end doesn't register to us.  It's a static peer with host=10.0.0.14.

Anyway, the way I look at the code a simple "sip reload" command is an extremely dangerous thing to do.

Firstly, the global settings are actually reset to default (whilst other threads may be using them).  So for starters there are massive potential issues with stuff going on in the sip_cfg structure.  It then iterates through the options from the config file and config options are updated appropriately.  This seems sensible until you start thinking about high-volume systems (and I don't consider my systems with 30-50 concurrent calls to be that high volume even)!  But they do get issued with reloads from time to time to add new clients, or change parameters for specific clients.  Even without the load the race remains, it's just much less likely to strike.  The point being that there is a brief interval during the reload where the settings may not be what I want it to be - and this CAN influence (and plainly does) my clients.

Out of hand, unfortunately I don't see a simple way to make the global settings change atomic, but a better way may be to load the config into an on-stack structure, and to then just update those settings that are not the same.  Not sure how atomic a single write really is, but to the best of my understanding (on x86 and x86_64 at least) a typical variable assignment to a 32-bit (or 64 on x86_64) is atomic to the memory bus.  This changing setting by setting will already be a heck of a lot better than current.

Secondly, it seems a lot of error checking happens *after* the updates to the global config structures.  The above strategy would deal with that too - invalid config?  Ok, no harm done, just refuse to clone into global config (or as in some cases from the code, just restrict the values appropriately).

If the above suggestion is acceptable I'm willing to attempt a patch.


Regarding the peers themselves - that code looks like black magic to me.  From what little I do understand the build_peer function is used to construct a peer.  It would seem that you're correct in that any non-realtime peers, and peers that isn't rt-cached (don't quite follow the logic of that first iff, but realtime is false for the peers coming from the config file, so the condition ends up being true) are *removed* from the peer list at this time.  After this, if the peer is NOT marked, we set firstpass to false (ie, peers are marked before we scan the config file, and if we bump into them multiple times in the config file it's dealt with differently.

At this point if this is a firstpass over the peer some variables are stored, peer is reset to default, and then we start updating it again.  Point being that whilst the peer is unlinked (nothing about active vs non-active - it's according to the code about realtime vs non-realtime) the update to the config is happening.  Only after the config is rebuilt does the peer get re-added.  A better mechanism might be to *clone* the peer, work on the cloned copy, and then atomically replace the original with the clone.  Never had to deal with the a2o voodoo but this might be something that the a2o containers might be perfectly capable of dealing with.  Perhaps again, build up a whole new a2o container and swap the containers?

Disclaimer:  Whilst I've written my fair share of code I honestly have no idea of the scope of my suggestions.
                
> chan_sip can fail to find a peer during reload
> ----------------------------------------------
>
>                 Key: ASTERISK-21194
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21194
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/General
>    Affects Versions: 11.2.1
>            Reporter: Jaco Kroon
>
> During a global system reload I saw this:
> {noformat}
> [Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c:     -- Executing [number at prov:5] Dial("Local/number at foo-0000377b;2", "SIP/bar/number,,") in new stack
> [Feb 28 16:50:26] VERBOSE[2712][C-0000317a] netsock2.c:   == Using SIP RTP CoS mark 5
> [Feb 28 16:50:26] ERROR[2712][C-0000317a] netsock2.c: getaddrinfo("bar", "(null)", ...): Name or service not known
> [Feb 28 16:50:26] WARNING[2712][C-0000317a] chan_sip.c: No such host: bar
> [Feb 28 16:50:26] WARNING[2712][C-0000317a] app_dial.c: Unable to create channel of type 'SIP' (cause 20 - Subscriber absent)
> {noformat}
> sip show peer (after reload):
> {noformat}
>   * Name       : bar
>   Description  : 
>   Secret       : <Not set>
>   MD5Secret    : <Not set>
>   Remote Secret: <Not set>
>   Context      : uls-makecall
>   Record On feature : automon
>   Record Off feature : automon
>   Subscr.Cont. : <Not set>
>   Language     : 
>   Tonezone     : <Not set>
>   Accountcode  : bar
>   AMA flags    : Unknown
>   Transfer mode: open
>   CallingPres  : Presentation Allowed, Not Screened
>   Callgroup    : 
>   Pickupgroup  : 
>   Named Callgr : 
>   Nam. Pickupgr: 
>   MOH Suggest  : 
>   Mailbox      : 
>   VM Extension : 8579
>   LastMsgsSent : 0/0
>   Call limit   : 2147483647
>   Max forwards : 0
>   Dynamic      : No
>   Callerid     : "" <>
>   MaxCallBR    : 384 kbps
>   Expire       : -1
>   Insecure     : no
>   Force rport  : Auto (No)
>   Symmetric RTP: No
>   ACL          : No
>   DirectMedACL : No
>   T.38 support : Yes
>   T.38 EC mode : Redundancy
>   T.38 MaxDtgrm: -1
>   DirectMedia  : No
>   PromiscRedir : No
>   User=Phone   : No
>   Video Support: No
>   Text Support : No
>   Ign SDP ver  : No
>   Trust RPID   : No
>   Send RPID    : No
>   Subscriptions: Yes
>   Overlap dial : No
>   DTMFmode     : rfc2833
>   Timer T1     : 500
>   Timer B      : 32000
>   ToHost       : 10.0.0.14
>   Addr->IP     : 10.0.0.14:5060
>   Defaddr->IP  : (null)
>   Prim.Transp. : UDP
>   Allowed.Trsp : UDP
>   Reg. exten   : 
>   Def. Username: 
>   SIP Options  : (none)
>   Codecs       : (g729)
>   Codec Order  : (g729:20)
>   Auto-Framing :  No 
>   Status       : OK (1 ms)
>   Useragent    : 
>   Reg. Contact : 
>   Qualify Freq : 60000 ms
>   Keepalive    : 0 ms
>   Variables    :
>                  __noivr = yes
>   Sess-Timers  : Accept
>   Sess-Refresh : uas
>   Sess-Expires : 1800 secs
>   Min-Sess     : 90 secs
>   RTP Engine   : asterisk
>   Parkinglot   : 
>   Use Reason   : No
>   Encryption   : No
> {noformat}
> And it would have looked exactly the same just before reload.  The section in sip.conf:
> {noformat}
> [bar]
> type=friend
> host=10.0.0.14
> qualify=yes
> disallow=all
> allow=g729
> context=uls-makecall
> directmedia=no
> dtmfmode=rfc2833
> accountcode=IS
> jbforce=no
> setvar=__noivr=yes
> transport=udp
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list