[asterisk-users] Help debugging a possible SIP channel leak in 11.17.0, possible race condition

Alex Villací­s Lasso a_villacis at palosanto.com
Fri Apr 10 14:16:37 CDT 2015

El 08/04/15 a las 08:22, Vinicius Fontes escribió:
> Have you tried Asterisk 13? The bridging mechanism has been completely rewritten on Asterisk 12, so there's no longer channel masquerading and zombie channels. Might be worth a try.
Sorry, this client is very hard to talk into stopping its operations long enough to install changes, let alone a major Asterisk version change. I already had trouble convincing him of the need to install a debugging version with DEBUG_THREADS enabled.

The issue appeared twice again today after two days of normal operation. I managed a capture of the output of "core show channel" on one of the leaked channels. Is there anything I can deduce from it? I see an entry of "Blocking in: (Not Blocking)". Is 
this supposed to be where in the dialplan the call is?

[root at pbx ~]# asterisk -rnx 'core show channel SIP/406-000010f7'
  -- General --
            Name: SIP/406-000010f7
            Type: SIP
        UniqueID: 1428688493.9200
        LinkedID: 1428688493.9200
       Caller ID: 47740435
  Caller ID Name: (N/A)
Connected Line ID: (N/A)
Connected Line ID Name: (N/A)
Eff. Connected Line ID: (N/A)
Eff. Connected Line ID Name: (N/A)
     DNID Digits: 9018111000547
        Language: es
           State: Up (6)
           Rings: 0
   NativeFormats: (ulaw)
     WriteFormat: slin
      ReadFormat: slin
  WriteTranscode: Yes (slin)->(ulaw)
   ReadTranscode: Yes (ulaw)->(slin)
1st File Descriptor: 31
       Frames in: 1671
      Frames out: 1623
  Time to Hangup: 0
    Elapsed Time: 0h7m20s
   Direct Bridge: <none>
Indirect Bridge: <none>
  --   PBX   --
         Context: macro-dialout-trunk
       Extension: s
        Priority: 19
      Call Group: 0
    Pickup Group: 0
     Application: Dial
            Data: SIP/5547740435/018111000547,300,
     Blocking in: (Not Blocking)
  Call Identifer: [C-00000bba]
BRIDGEPVTCALLID=5b2411027d0fb4fd46bdbd2076c9523f at
SIPURI=sip:406 at

   CDR Variables:
level 1: dnid=9018111000547
level 1: clid=406
level 1: src=406
level 1: dst=9018111000547
level 1: dcontext=from-internal
level 1: channel=SIP/406-000010f7
level 1: dstchannel=SIP/5547740435-000010f8
level 1: lastapp=Dial
level 1: lastdata=SIP/5547740435/018111000547,300,
level 1: start=2015-04-10 12:54:53
level 1: answer=2015-04-10 12:55:22
level 1: duration=440
level 1: billsec=411
level 1: disposition=ANSWERED
level 1: amaflags=DOCUMENTATION
level 1: uniqueid=1428688493.9200
level 1: linkedid=1428688493.9200
level 1: userfield=audio:OUT406-20150410-125453-1428688493.9200.wav
level 1: sequence=11006

> 2015-04-07 20:33 GMT-03:00 Alex Villací­s Lasso <a_villacis at palosanto.com <mailto:a_villacis at palosanto.com>>:
>     El 07/04/15 a las 17:38, Alex Villací­s Lasso escribió:
>         I am trying to collect enough information about an problem a client is having with its asterisk 11.17.0 x86_64. This issue was observed before in 1.8.20, and we upgraded to 11.15.0 and then to 11.17.0 with no solution.
>         Background: this client is a telemarketing call-center that generates outgoing calls with close to a hundred agents operating simultaneously in peak hours. The system uses asterisk with FreePBX 2.8. In order to generate the calls, I wrote a
>         program that connects to Asterisk using the AMI protocol. This program expects the SIP agent extensions to be assigned as members of queues, of which there are about 20, as shown below:
>         9007 has 0 calls (max unlimited) in 'random' strategy (5s holdtime, 68s talktime), W:0, C:581, A:260, SL:82.6% within 60s
>            Members:
>               SIP/147 (ringinuse disabled) (dynamic) (On Hold) has taken 21 calls (last was 800 secs ago)
>               SIP/417 (ringinuse disabled) (dynamic) (In use) has taken 77 calls (last was 708 secs ago)
>               SIP/416 (ringinuse disabled) (dynamic) (In use) has taken 41 calls (last was 656 secs ago)
>               SIP/408 (ringinuse disabled) (dynamic) (In use) has taken 50 calls (last was 789 secs ago)
>            No Callers
>         The program runs "queue show" through AMI every few seconds. For each queue to be used in telemarketing, the program counts the number of members that are "Not In Use". If at least one is found, it reads that many phone numbers from the database
>         and uses the AMI Originate command on each one, as follows:
>         Action: Originate
>         Channel: Local/NNNNNNNNNN at from-internal
>         Exten: CCCC
>         Context: from-internal
>         Priority: 1
>         Async: true
>         ActionID: xxx
>         Here, NNNNNNNNNN is the number read from the database and CCCC is the queue extension in the FreePBX-created context that eventually runs the Queue() dialplan application for the corresponding queue. This causes the call to be connected between
>         the outgoing number and the queue, and is then assigned to a queue member by Asterisk. The dialplan is configured to route NNNNNNNNNN through one of a series of SIP trunks using the outbound routes as configured by FreePBX.
>         The issue is that although this strategy works correctly on the user's machine for a few days, we have been observing that eventually the application stops placing calls. The agents are all idle (all 90 to 100 of them), but the "queue show"
>         command shows them to be "In Use" on all queues. Furthermore, in normal operation, the "core show channels" command shows at most one channel for each configured SIP client in the "Up" state, but when calls stop being placed, the same command
>         reports multiple channels in the "Up" state, as follows (after sorting):
>         Local/9757007441 at from-internal-0000a447;2!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740412/5557007441,300,!47740412!!!3!572!(None)!1428426012.169192
>         Local/9759315789 at from-internal-0000a456;1<ZOMBIE>!from-trunk-sip-5547740412!!1!Up!AppDial!(Outgoing Line)!9759315789!!!3!500!(None)!1428426084.169326
>         Local/9759315789 at from-internal-0000a456;2!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740412/5559315789,300,!47740412!!!3!500!(None)!1428426084.169323
>         SIP/104-00014c61!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740413/0453511309468,300,!47740413!!!3!562!SIP/5547740413-00014c62!1428426022.169224
>         SIP/110-00014c2b!EjecutivoROLLRATE!9014!1!Up!AppQueue!(Outgoing Line)!110!!!3!590!(None)!1428425994.169124
>         SIP/110-00014e4e!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740413/5552114757,300,!47740413!!!3!92!(None)!1428426491.169760
>         SIP/113-00014c8c!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740420/0016499465293,300,!47740420!!!3!532!(None)!1428426052.169273
>         SIP/114-00014ce6!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740410/040,300,!47740410!!!3!430!(None)!1428426154.169384
>         SIP/115-00014ea4!macro-dialout-trunk!s!19!Ring!Dial!SIP/5547740400/0059144681666,300,!47740400!!!3!10!(None)!1428426574.169850
>         SIP/119-00014c26!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740413/016255863252,300,!47740413!!!3!593!(None)!1428425991.169113
>         SIP/119-00014d1a!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740413/5552716011,300,!47740413!!!3!383!(None)!1428426201.169436
>         SIP/119-00014d4d!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740413/5556802622,300,!47740413!!!3!327!(None)!1428426257.169493
>         SIP/120-00014d5e!macro-dial-one!s!37!Up!Dial!SIP/230,"",trT!120!!!3!314!SIP/230-00014d5f!1428426270.169510
>         SIP/121-00014c24!EjecutivoROLLRATE!9014!1!Up!AppQueue!(Outgoing Line)!121!!!3!596!(None)!1428425988.169111
>         SIP/122-00014b56!EjecutivoQUADS!93000!1!Up!AppQueue!(Outgoing Line)!122!!!3!677!(None)!1428425906.168693
>         SIP/123-00014d53!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740410/017222497260,300,!47740410!!!3!320!(None)!1428426264.169499
>         SIP/123-00014e35!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740410/017222497260,300,!47740410!!!3!121!(None)!1428426463.169735
>         SIP/123-00014e9e!macro-dialout-trunk!s!19!Ring!Dial!SIP/5547740410/5556350254,300,!47740410!!!3!13!(None)!1428426570.169844
>         SIP/125-00014bc7!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740435/0445549261961,300,!47740435!!!3!626!(None)!1428425958.168938
>         SIP/125-00014cba!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740400/5521634648,300,!47740400!!!3!493!(None)!1428426091.169328
>         SIP/129-00014d8f!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740400/0050687069324,300,!47740400!!!3!277!(None)!1428426307.169565
>         SIP/130-00014a57!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740414/014777167359,300,!47740414!!!3!777!(None)!1428425807.168130
>         SIP/130-00014d08!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740414/5559502494,300,!47740414!!!3!402!(None)!1428426181.169418
>         SIP/130-00014d56!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740414/5559502494,300,!47740414!!!3!317!(None)!1428426267.169502
>         SIP/130-00014db0!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740414/5559502470,300,!47740414!!!3!253!(None)!1428426331.169598
>         SIP/130-00014e16!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740414/5559502494,300,!47740414!!!3!156!(None)!1428426428.169702
>         SIP/131-00014b8c!EjecutivoSEGUNDAS!99000!1!Up!AppQueue!(Outgoing Line)!131!!!3!651!(None)!1428425932.168799
>         SIP/131-00014d7e!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740410/5553625501,300,!47740410!!!3!294!(None)!1428426290.169548
>         SIP/131-00014e74!macro-dialout-trunk!s!19!Up!Dial!SIP/5547740410/5553526428,300,!47740410!!!3!45!SIP/5547740410-00014e75!1428426539.169798
>         In the example shown above, SIP/119, SIP/123, SIP/131 have three channels, and SIP/130 has five. Even assuming a bug in my program, a single SIP extension cannot be handling five simultaneous calls.
>         Now, here comes my speculation. I believe what is happening is that somehow, channels are not being cleaned on hangup, but somehow leave the extension capable of accepting a new call. My program sees queues with a mixture of non-affected idle
>         members and affected members (which appear in "queue show" as "In Use" but are actually idle and will accept a new call). As long as there is one non-affected member that can appear as "Not In Use", my program keeps placing calls on that queue,
>         which are handled by both affected an unaffected members. This would explain the multiple channels for a single SIP phone. As soon as all the members become affected in one queue, the dialing stalls.
>         Is this speculation sound, or even possible, given the internal architecture of Asterisk?
>         What tools can be used to debug this? Is this a channel leak? I know of DEBUG_THREADS for debugging deadlocks, but this does not seem like one.
>         What extra information can be provided to make a bug report on this?
>     Additional information: all affected channels I have checked appear in the log in lines like this, dozens of times:
>     chan_sip.c: Autodestruct on dialog XXXXXXXXXXXXX with owner AFFECTEDCHANNEL in place (Method: BYE). Rescheduling destruction for 10000 ms

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20150410/95e3e874/attachment.html>

More information about the asterisk-users mailing list