[asterisk-users] Help debugging a possible SIP channel leak in 11.17.0, possible race condition

Vinicius Fontes vinicius at aittelecom.com.br
Wed Apr 8 08:22:28 CDT 2015


Have you tried Asterisk 13? The bridging mechanism has been completely
rewritten on Asterisk 12, so there's no longer channel masquerading and
zombie channels. Might be worth a try.

2015-04-07 20:33 GMT-03:00 Alex Villací­s Lasso <a_villacis at palosanto.com>:

> El 07/04/15 a las 17:38, Alex Villací­s Lasso escribió:
>
>  I am trying to collect enough information about an problem a client is
>> having with its asterisk 11.17.0  x86_64. This issue was observed before in
>> 1.8.20, and we upgraded to 11.15.0 and then to 11.17.0 with no solution.
>>
>> Background: this client is a telemarketing call-center that generates
>> outgoing calls with close to a hundred agents operating simultaneously in
>> peak hours. The system uses asterisk with FreePBX 2.8. In order to generate
>> the calls, I wrote a program that connects to Asterisk using the AMI
>> protocol. This program expects the SIP agent extensions to be assigned as
>> members of queues, of which there are about 20, as shown below:
>>
>> 9007 has 0 calls (max unlimited) in 'random' strategy (5s holdtime, 68s
>> talktime), W:0, C:581, A:260, SL:82.6% within 60s
>>    Members:
>>       SIP/147 (ringinuse disabled) (dynamic) (On Hold) has taken 21 calls
>> (last was 800 secs ago)
>>       SIP/417 (ringinuse disabled) (dynamic) (In use) has taken 77 calls
>> (last was 708 secs ago)
>>       SIP/416 (ringinuse disabled) (dynamic) (In use) has taken 41 calls
>> (last was 656 secs ago)
>>       SIP/408 (ringinuse disabled) (dynamic) (In use) has taken 50 calls
>> (last was 789 secs ago)
>>    No Callers
>>
>> The program runs "queue show" through AMI every few seconds. For each
>> queue to be used in telemarketing, the program counts the number of members
>> that are "Not In Use". If at least one is found, it reads that many phone
>> numbers from the database and uses the AMI Originate command on each one,
>> as follows:
>>
>> Action: Originate
>> Channel: Local/NNNNNNNNNN at from-internal
>> Exten: CCCC
>> Context: from-internal
>> Priority: 1
>> Async: true
>> ActionID: xxx
>>
>> Here, NNNNNNNNNN is the number read from the database and CCCC is the
>> queue extension in the FreePBX-created context that eventually runs the
>> Queue() dialplan application for the corresponding queue. This causes the
>> call to be connected between the outgoing number and the queue, and is then
>> assigned to a queue member by Asterisk. The dialplan is configured to route
>> NNNNNNNNNN through one of a series of SIP trunks using the outbound routes
>> as configured by FreePBX.
>>
>> The issue is that although this strategy works correctly on the user's
>> machine for a few days, we have been observing that eventually the
>> application stops placing calls. The agents are all idle (all 90 to 100 of
>> them), but the "queue show" command shows them to be "In Use" on all
>> queues. Furthermore, in normal operation, the "core show channels" command
>> shows at most one channel for each configured SIP client in the "Up" state,
>> but when calls stop being placed, the same command reports multiple
>> channels in the "Up" state, as follows (after sorting):
>>
>> Local/9757007441 at from-internal-0000a447;2!macro-
>> dialout-trunk!s!19!Up!Dial!SIP/5547740412/5557007441,300,
>> !47740412!!!3!572!(None)!1428426012.169192
>> Local/9759315789 at from-internal-0000a456;1<ZOMBIE>!
>> from-trunk-sip-5547740412!!1!Up!AppDial!(Outgoing
>> Line)!9759315789!!!3!500!(None)!1428426084.169326
>> Local/9759315789 at from-internal-0000a456;2!macro-
>> dialout-trunk!s!19!Up!Dial!SIP/5547740412/5559315789,300,
>> !47740412!!!3!500!(None)!1428426084.169323
>> SIP/104-00014c61!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740413/0453511309468,300,!47740413!!!3!562!SIP/
>> 5547740413-00014c62!1428426022.169224
>> SIP/110-00014c2b!EjecutivoROLLRATE!9014!1!Up!AppQueue!(Outgoing
>> Line)!110!!!3!590!(None)!1428425994.169124
>> SIP/110-00014e4e!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740413/5552114757,300,!47740413!!!3!92!(None)!1428426491.169760
>> SIP/113-00014c8c!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740420/0016499465293,300,!47740420!!!3!532!(None)!
>> 1428426052.169273
>> SIP/114-00014ce6!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740410/040,300,!47740410!!!3!430!(None)!1428426154.169384
>> SIP/115-00014ea4!macro-dialout-trunk!s!19!Ring!Dial!
>> SIP/5547740400/0059144681666,300,!47740400!!!3!10!(None)!
>> 1428426574.169850
>> SIP/119-00014c26!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740413/016255863252,300,!47740413!!!3!593!(None)!
>> 1428425991.169113
>> SIP/119-00014d1a!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740413/5552716011,300,!47740413!!!3!383!(None)!1428426201.169436
>> SIP/119-00014d4d!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740413/5556802622,300,!47740413!!!3!327!(None)!1428426257.169493
>> SIP/120-00014d5e!macro-dial-one!s!37!Up!Dial!SIP/230,"",
>> trT!120!!!3!314!SIP/230-00014d5f!1428426270.169510
>> SIP/121-00014c24!EjecutivoROLLRATE!9014!1!Up!AppQueue!(Outgoing
>> Line)!121!!!3!596!(None)!1428425988.169111
>> SIP/122-00014b56!EjecutivoQUADS!93000!1!Up!AppQueue!(Outgoing
>> Line)!122!!!3!677!(None)!1428425906.168693
>> SIP/123-00014d53!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740410/017222497260,300,!47740410!!!3!320!(None)!
>> 1428426264.169499
>> SIP/123-00014e35!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740410/017222497260,300,!47740410!!!3!121!(None)!
>> 1428426463.169735
>> SIP/123-00014e9e!macro-dialout-trunk!s!19!Ring!Dial!
>> SIP/5547740410/5556350254,300,!47740410!!!3!13!(None)!1428426570.169844
>> SIP/125-00014bc7!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740435/0445549261961,300,!47740435!!!3!626!(None)!
>> 1428425958.168938
>> SIP/125-00014cba!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740400/5521634648,300,!47740400!!!3!493!(None)!1428426091.169328
>> SIP/129-00014d8f!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740400/0050687069324,300,!47740400!!!3!277!(None)!
>> 1428426307.169565
>> SIP/130-00014a57!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740414/014777167359,300,!47740414!!!3!777!(None)!
>> 1428425807.168130
>> SIP/130-00014d08!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740414/5559502494,300,!47740414!!!3!402!(None)!1428426181.169418
>> SIP/130-00014d56!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740414/5559502494,300,!47740414!!!3!317!(None)!1428426267.169502
>> SIP/130-00014db0!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740414/5559502470,300,!47740414!!!3!253!(None)!1428426331.169598
>> SIP/130-00014e16!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740414/5559502494,300,!47740414!!!3!156!(None)!1428426428.169702
>> SIP/131-00014b8c!EjecutivoSEGUNDAS!99000!1!Up!AppQueue!(Outgoing
>> Line)!131!!!3!651!(None)!1428425932.168799
>> SIP/131-00014d7e!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740410/5553625501,300,!47740410!!!3!294!(None)!1428426290.169548
>> SIP/131-00014e74!macro-dialout-trunk!s!19!Up!Dial!
>> SIP/5547740410/5553526428,300,!47740410!!!3!45!SIP/5547740410-00014e75!
>> 1428426539.169798
>>
>> In the example shown above, SIP/119, SIP/123, SIP/131 have three
>> channels, and SIP/130 has five. Even assuming a bug in my program, a single
>> SIP extension cannot be handling five simultaneous calls.
>>
>> Now, here comes my speculation. I believe what is happening is that
>> somehow, channels are not being cleaned on hangup, but somehow leave the
>> extension capable of accepting a new call. My program sees queues with a
>> mixture of non-affected idle members and affected members (which appear in
>> "queue show" as "In Use" but are actually idle and will accept a new call).
>> As long as there is one non-affected member that can appear as "Not In
>> Use", my program keeps placing calls on that queue, which are handled by
>> both affected an unaffected members. This would explain the multiple
>> channels for a single SIP phone. As soon as all the members become affected
>> in one queue, the dialing stalls.
>>
>> Is this speculation sound, or even possible, given the internal
>> architecture of Asterisk?
>> What tools can be used to debug this? Is this a channel leak? I know of
>> DEBUG_THREADS for debugging deadlocks, but this does not seem like one.
>> What extra information can be provided to make a bug report on this?
>>
> Additional information: all affected channels I have checked appear in the
> log in lines like this, dozens of times:
>
> chan_sip.c: Autodestruct on dialog XXXXXXXXXXXXX with owner
> AFFECTEDCHANNEL in place (Method: BYE). Rescheduling destruction for 10000
> ms
>
>
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> New to Asterisk? Join us for a live introductory webinar every Thurs:
>               http://www.asterisk.org/hello
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20150408/d3d41904/attachment.html>


More information about the asterisk-users mailing list