[asterisk-dev] AMI Disconnect/Sudden "Asterisk Call Manager/1.3" received

Wed May 7 07:26:26 CDT 2014

On 06 May 2014, at 19:13, Daniel McFarlane <daniel at szeto.ca> wrote:

> On 05/06/2014 10:56 AM, Olle E. Johansson wrote:
>> On 06 May 2014, at 16:11, Daniel McFarlane <daniel at szeto.ca> wrote:
>> 
>>> Hi All,
>>> 
>>> I've been working with Asterisk for 2.5 years now but I am new to the mailing list.  Hoping to get some answers here..
>>> 
>>> I've written a piece of software to control Asterisk via AMI.  I'm able to login, making outbound calls, receive/record calls, handle digits, put people into conferences, etc.  Everything seems to work well, at a low to medium call volume.
>>> 
>>> I've been stress testing the system and found what seems to be a memory issue within Asterisk.  However, after over a day of searching I can't even find any bug reports or anything to bring me closer to a solution..but still hoping to find a solution or that someone can help me find one..
>>> 
>>> I connect to the AMI successfully.  In my test I am making an outbound call through a sip "device" I configured within sip.conf. This sip "device" basically connects back into the same Asterisk I am using to generate the calls.
>>> 
>>> If I generate outbound calls (i.e.: Via Action: Originate) using a low call volume everything seems to be fine.  e.g.: I can generate 10 calls and they all go through and complete successfully (with a total of 20 calls bring processed by the same Asterisk, due to both outbound and inbound calls).
>>> 
>>> The problem arises If I bombard (i.e.: Write all the Originate requests within about 7 seconds) the AMI to generate 97 outbound calls (for a total of 194 channels).  The first Originate commands seem to get processed,  but then all of a sudden (after Asterisk seems to have started processing a good amount of my Originate requests already) Asterisk seems to reset the AMI interface! Without even having a dropped connection I receive a new "Asterisk Call Manager/1.3" string and then previous commands that I issued start coming back with Response: Error and Message: Permission denied.
>>> 
>>> I've enabled debugging and here's the security log output when I connect:
>>> 
>>> [Apr  1 12:43:25] SECURITY[9877] res_security_log.c: SecurityEvent="SuccessfulAuth",EventTV="1396370605-551714",Severity="Informational",Service="AMI",EventVersion="1",AccountID="username",SessionID="0x7f0c540d96b8",LocalAddress="IPV4/TCP/0.0.0.0/5038",RemoteAddress="IPV4/TCP/192.168.1.62/52139",UsingPassword="0",SessionTV="1396370605-551710"
>>> 
>>> (This is the time when I am issuing commands, it is accepting and processing them)
>>> 
>>> Then all of a sudden (when I receive a new "Asterisk Call Manager/1.3" string and commands are rejected with "Permission denied") here is what the log shows:
>>> 
>>> [Apr  1 12:44:36] SECURITY[9877] res_security_log.c: SecurityEvent="RequestNotAllowed",EventTV="1396370676-876795",Severity="Error",Service="AMI",EventVersion="1",AccountID="",SessionID="0x7f0c54072968",LocalAddress="IPV4/TCP/0.0.0.0/5038",RemoteAddress="IPV4/TCP/192.168.1.62/52144",RequestType="Action: Originate",SessionTV="0-0"
>>> 
>>> (Note: How the AccountID is now empty)
>>> 
>>> At this same time the CLI shows:
>>> 
>>>  == Manager '*****' logged off from 192.168.1.**
>>> 
>>> ..yet it wasn't my application that disconnected and/or issued an Action: logoff to Asterisk (my application doesn't even detect a connection drop, so it seems like Asterisk just logs the manager user out!).
>>> 
>>> Why did Asterisk all of a sudden reset itself and/or lose reference to the user I am authenticating with?
>>> 
>>> Note: My maxcalls setting within asterisk.conf was set to default. I even tried upping it to 200 and that didn't help.  My maxload setting was untouched/commented out and therefore set to default (no threshold).  I still tried upping it to 6.0, 8.0 and 10.0 to no avail.  I even uncommented minmemfree and set it to 256 (although calling free -m from a terminal showed their to always be plenty of memory on my system).
>>> 
>>> Does anyone have an idea how to fix/correct this?
>> One thing you have to be aware of is that Action:s are atomic. You have to wait for the response of one action before you send the next one. Otherwise they will all end up in the TCP buffer. That's propably what you see - a lot of queued up messages being red after the tcp connection is reset.
>> 
>> That doesn't explain why something resets and send you a new connect message and possibly requires a new login. That seems like a bug that needs to be reported. Can you easily repeat this?
>> 
>> Just my 5 öre. Anyone else with good comments on this?
>> /O
> 
> Thank you for the quick responses.
> 
> First off, as Tony pointed out I did forget to include my Asterisk version.  I did realize it right after sending the first message, but figured I'd send it in my first response.  Sorry about that. I'm running Asterisk 11.2.1.
> 
> Olle: When associating with 1 audio channel I do wait for the response before issuing the next command.  In this case, however, my software internally treats each as a separate call (i.e.: As if I had issued Originate requests to separate DAHDI trunks, instead of a IAX/SIP `pipe` that handles all the SIP/IAX calls) so I do actually send multiple back-to-back Action: Originate commands without waiting for a response (but still wait for the OriginateResponse before issuing more commands for that same call, as necessary given that I need the specific callno from Asterisk before I can issue more commands).  I assumed that Asterisk would create a separate IAX/SIP entity/call for each Originate response (which it seems to do well under low volume, e.g.: 10 calls) & that this would therefore be ok.
> 
> Also, after for instance sending 6 Originate requests Asterisk does come back with:
> 
> e.g.:
> 
> Response: Success
> ActionID: 819367539
> Message: Originate successfully queued
> 
> ..for each command I issued.  Is this not ok?  I can see how waiting to issue a response prior to to issuing another Action: Originate command would prevent a lot of TCP commands from being queued in the TCP buffer, but still seems like Asterisk should be handling it ok. Well, unless I overload it's queue or something..
> 
> You are right about the commands all ending up in the TCP buffer.  I also concluded that the "Permission denied" messages were the result of Asterisk trying to process from the TCP buffer, after it performed it's reset.
I wonder how an operating system treats an overflowing TCP buffer... Just guessing here.

> 
> Yes, I can fairly easily reproduce this.  Here are some details to understand on how to reproduce:
> 
> 1) I have noticed that if I say issue 100 calls in blocks of 10 calls every 2 seconds that Asterisk seems to be able to handle it..just not 97 calls at one time.
> 2) Strangely, here's one I can't explain.  If I've been connected via the AMI port for some time and then start my tests it seems to be ok.  If I connect, perform a test in blocks (as in #1 above) (kind of like warming it up) and then bombard with 97 calls at a time, it seems to hold up.
> 3) The easiest way for me to reproduce it is by disconnecting my software and reconnecting just prior to bombarding with 97 calls. Now I'm pulling at strings to explain this, but is it possible that Asterisk needs some set up time after login or treats this as a kind of spam attack?  Perhaps that could explain why it resets the connection?  I'd have to do more testing to see if I can get it to reset after being connected for a while..

I don't think we have anything after login that limits any rate on action:s. I don't see anything in Asterisk being "setup" after login.

Very interesting issue.

/O