[asterisk-dev] Asterisk Load Performance

Matthew Jordan mjordan at digium.com
Tue Jun 21 13:03:16 CDT 2016


On Tue, Jun 21, 2016 at 12:16 PM, Richard Mudgett <rmudgett at digium.com> wrote:
>
>
> On Tue, Jun 21, 2016 at 11:12 AM, Michael Petruzzello
> <michael.petruzzello at civi.com> wrote:
>>
>> >On Fri, Jun 17, 2016 at 1:37 PM, Richard Mudgett <rmudgett at digium.com>
>> > wrote:
>> >>
>> >>
>> >> On Fri, Jun 17, 2016 at 12:36 PM, Michael Petruzzello
>> >> <michael.petruzzello at civi.com> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I am currently working on determining bottlenecks in Asterisk and a
>> >>> Stasis
>> >>> App. I'm currently trying to handle 83.3 calls/second. For the most
>> >>> part,
>> >>> Asterisk and the Stasis APP handle that well, but there is a 60+
>> >>> second
>> >>> delay in response time.
>> >>>
>> >>> On the Asterisk side, I am seeing the following warnings. [Jun 17
>> >>> 12:00:16] WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
>> >>> 'subm:cdr_engine-00000003' task processor queue reached 500 scheduled
>> >>> tasks.
>> >>> [Jun 17 12:00:18] WARNING[25477][C-00000068]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subm:devService-test-00000038' task processor
>> >>> queue
>> >>> reached 500 scheduled tasks.
>> >>> [Jun 17 12:00:21] WARNING[26298][C-000000a3]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subp:PJSIP/sippeer-00000022' task processor
>> >>> queue
>> >>> reached 500 scheduled tasks.
>> >>> [Jun 17 12:00:23] WARNING[27339][C-0000010d]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subm:ast_channel_topic_all-cached-00000032'
>> >>> task
>> >>> processor queue reached 500 scheduled tasks.
>> >>> [Jun 17 12:01:32] WARNING[31697][C-000003b2]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subm:ast_channel_topic_all-00000036' task
>> >>> processor
>> >>> queue reached 500 scheduled tasks.
>> >>> [Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803
>> >>> taskprocessor_push:
>> >>> The 'SIP' task processor queue reached 500 scheduled tasks.
>> >>>
>> >>> I have not seen a configuration setting on Asterisk to prevent these
>> >>> warnings from occurring (I'm trying to avoid modifying Asterisk source
>> >>> code
>> >>> if possible). Looking at the task processors, I see the queue to the
>> >>> stasis
>> >>> app bottlenecks:
>> >>> subm:devService-test-00000038                    4560990          0
>> >>> 1041689. It does clear up relatively quickly. The CDR engine also
>> >>> bottle
>> >>> necks (extremely badly), but I don't use that. Nothing else comes
>> >>> close to
>> >>> having a large queue.
>> >>>
>> >>> The stasis app itself is extremely streamlined and is very capable of
>> >>> handling a large number of messages at a time. The app runs with the
>> >>> JVM so
>> >>> I am also researching into that as well as the netty library I am
>> >>> using for
>> >>> the websocket connections.
>> >>>
>> >>> Any insight into Asterisk's side of the equation and how it scales on
>> >>> 40
>> >>> vCPUs would be greatly appreciated.
>> >>
>> >>
>> >> There are no options to disable those taskprocessor queue size
>> >> warnings.
>> >> They are a
>> >> symptom of the system being severely stressed.  If the stress continues
>> >> it
>> >> is possible
>> >> that the system could consume all memory in those taskprocessor queues.
>> >>
>> >> Recent changes to the Asterisk v13 branch were made to help throttle
>> >> back
>> >> incoming
>> >> SIP requests on PJSIP when the taskprocessors become backlogged like
>> >> you are
>> >> seeing.
>> >> These changes will be in the forthcoming v13.10.0 release.  If you
>> >> want, you
>> >> can test with
>> >> the current v13 branch to see how these changes affect your stress
>> >> testing.
>> >>
>> >> If you don't need CDR's then you really need to disable them as they
>> >> consume
>> >> a lot of
>> >> processing time and the CDR taskprocessor queue backlog can take
>> >> minutes to
>> >> clear.
>> >>
>> >
>> >To echo what Richard said, because Asterisk is now sharing state
>> >across the Stasis message bus, turning off subscribers to that bus
>> >will help performance. Some easy ones to disable, if you aren't using
>> >them, are CDRs, CEL, and AMI. Those all do a reasonable amount of
>> >processing, and you can get some noticeable improvement by disabling
>> >them.
>> >
>> >Once you get past that, you can start fiddling with some of the lower
>> >level options. To start, you can throttle things back further by
>> >disabling certain internal messages in stasis.conf. As stasis.conf
>> >notes, functionality within Asterisk can break (or just not happen) if
>> >some messages are removed. For example, disabling
>> >'ast_channel_snapshot_type' would break ... most things. You may
>> >however be able to streamline your application by looking at what ARI
>> >messages it cares about, what messages it doesn't, inspecting the
>> >code, and disabling those that you don't care about. Lots of testing
>> >should occur before doing this, of course.
>> >
>> >You may also be able to get some different performance characteristics
>> >by changing the threadpool options for the message bus in stasis.conf.
>> >This may make a difference, depending on the underlying machine.
>>
>> Thank you for the suggestions.
>>
>> I'm running Asterisk on 40 vCPUs with 120 GB of RAM. Changing the thread
>> pool options to many more threads is not increasing performance, and at a
>> certain point it decreases performance.
>>
>> With further testing and having implemented your suggestions, I am
>> realizing the subm:devService-test-00000038 task processor is a major
>> bottleneck. I have always read that Asterisk can handle as many calls as the
>> hardware allows, but I'm not seeing that.
>>
>> I can go with a different architecture of multiple Asterisk servers
>> working together, but I would prefer one server to reduce complexity.
>>
>> What I am doing is not really a stress test because I really need Asterisk
>> to handle 1000s of callers calling in within a few minutes.
>
>
> The subm:devService-test-00000038 taskprocessor is servicing the stasis
> message bus
> communication with your devService-test ARI application.  Since each
> taskprocessor is
> executed by one thread, that is going to be a bottleneck.  One thing you can
> try is to
> register multiple copies of your ARI application and randomly spread the
> calls to the
> different copies of the application.  (devService-test1,
> devService-test2,...)
>

To follow up with Richard's suggestion:

Events being written out (either over a WebSocket in ARI or over a
direct TCP socket in AMI) have to be fully written before the next
event is written. That means that the client application processing
the events can directly slow down the rate at which events are sent if
the process that is reading the event does not keep reading from the
socket as quickly as possible. You may already be doing this - in
which case, disregard the suggestion - but you may want to have one
thread/process read from the ARI WebSocket, and farm out the
processing of the events to some other thread/process.

-- 
Matthew Jordan
Digium, Inc. | CTO
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org



More information about the asterisk-dev mailing list