[asterisk-users] asterisk 16.5 / pjsip outage because of task processor queue >= 500 tasks and too many open files later on
Joshua C. Colp
jcolp at digium.com
Mon Aug 19 05:21:04 CDT 2019
On Sat, Aug 17, 2019, at 3:07 AM, Michael Maier wrote:
> Hello!
>
<snip>
>
> Few words about the usage of asterisk:
> - 2 registered endpoints
> - 4 SIPS / SRTP trunks
> - 46 calls at 2019-08-15
> - the sip:isp.de trunk hadn't been used
>
>
> Some findings:
>
> - The problem seems to be triggered by the "task processor queue
> reached 500 scheduled tasks" problem. I saw this message in June, too.
> Context each time was the same ISP (-> not Deutsche Telekom!) as
> described above in conjunction with registration retries, too.
>
> - Registration configuration of this provider:
>
> <Registration/ServerURI..............................>
> <Auth..........> <Status.......>
> ==========================================================================================
>
> ispPJSIP/sip:isp.de ispPJSIP
> Registered
>
> ParameterName : ParameterValue
> ==============================================================
> auth_rejection_permanent : true
> client_uri : sip:0049... at isp.de
> contact_user : +49...
> endpoint : ispPJSIP
> expiration : 3600
> fatal_retry_interval : 0
> forbidden_retry_interval : 10
> line : true
> max_retries : 10000
> outbound_auth : ispPJSIP
> outbound_proxy :
> retry_interval : 60
> server_uri : sip:isp.de
> support_path : false
> transport : 0.0.0.0-tls
>
>
> The expiration value given above is not true. isp.de forces
> ReRegistration each 90s (asterisk does it each 60s if I remember
> correctly)!
> Contact: <sip:+49... at isp-ip:5061;transport=TLS;line=....>;expires=90
>
> - After performing the restart of asterisk, registration to the isp.de
> hadn't any problem any more. Therefore I think,
> the reregistration problem wasn't a problem of the provider not
> answering but in fact a problem of asterisk being unable to correctly
> perform the ReRegistration.
>
>
>
>
> The final question:
> ===================
> Is there a problem with taskprocessors probably not being canceled on
> some conditions (maybe unanswered or hanging registrations?) and
> therefore steadily growing up and using more and more open files (and
> memory) until asterisk can't do
> anything anymore because some limits are exceeded as a result?
> Could there be a problem with the retry interval 60s and the real
> ReRegister done each 60s, too (performing a "fork" bomb)?
Taskprocessors aren't recurring things individually, they are a work queue item that is always executed. It may be a problem with the fact that it is TLS, and perhaps the act of trying to establish the TLS connection is taking a long time to fail causing things to build up. I'd suggest collecting a backtrace[1] and providing complete information on an issue[2].
[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace
[2] https://issues.asterisk.org/jira
--
Joshua C. Colp
Digium - A Sangoma Company | Senior Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org
More information about the asterisk-users
mailing list