[asterisk-users] asterisk 16.5 / pjsip outage because of task processor queue >= 500 tasks and too many open files later on

Joshua C. Colp jcolp at digium.com
Mon Aug 19 05:21:04 CDT 2019


On Sat, Aug 17, 2019, at 3:07 AM, Michael Maier wrote:
> Hello!
> 

<snip>

> 
> Few words about the usage of asterisk:
> - 2 registered endpoints
> - 4 SIPS / SRTP trunks
> - 46 calls at 2019-08-15
> - the sip:isp.de trunk hadn't been used
> 
> 
> Some findings:
> 
> - The problem seems to be triggered by the "task processor queue 
> reached 500 scheduled tasks" problem. I saw this message in June, too.
>   Context each time was the same ISP (-> not Deutsche Telekom!) as 
> described above in conjunction with registration retries, too.
> 
> - Registration configuration of this provider:
> 
>  <Registration/ServerURI..............................>  
> <Auth..........>  <Status.......>
> ==========================================================================================
> 
>  ispPJSIP/sip:isp.de                                     ispPJSIP       
>    Registered
> 
>  ParameterName            : ParameterValue
>  ==============================================================
>  auth_rejection_permanent : true
>  client_uri               : sip:0049... at isp.de
>  contact_user             : +49...
>  endpoint                 : ispPJSIP
>  expiration               : 3600
>  fatal_retry_interval     : 0
>  forbidden_retry_interval : 10
>  line                     : true
>  max_retries              : 10000
>  outbound_auth            : ispPJSIP
>  outbound_proxy           :
>  retry_interval           : 60
>  server_uri               : sip:isp.de
>  support_path             : false
>  transport                : 0.0.0.0-tls
> 
> 
> The expiration value given above is not true. isp.de forces 
> ReRegistration each 90s (asterisk does it each 60s if I remember 
> correctly)!
>     Contact: <sip:+49... at isp-ip:5061;transport=TLS;line=....>;expires=90
> 
> - After performing the restart of asterisk, registration to the isp.de 
> hadn't any problem any more. Therefore I think,
>   the reregistration problem wasn't a problem of the provider not 
> answering but in fact a problem of asterisk being unable to correctly 
> perform the ReRegistration.
> 
> 
> 
> 
> The final question:
> ===================
> Is there a problem with taskprocessors probably not being canceled on 
> some conditions (maybe unanswered or hanging registrations?) and 
> therefore steadily growing up and using more and more open files (and 
> memory) until asterisk can't do
> anything anymore because some limits are exceeded as a result?
> Could there be a problem with the retry interval 60s and the real 
> ReRegister done each 60s, too (performing a "fork" bomb)?

Taskprocessors aren't recurring things individually, they are a work queue item that is always executed. It may be a problem with the fact that it is TLS, and perhaps the act of trying to establish the TLS connection is taking a long time to fail causing things to build up. I'd suggest collecting a backtrace[1] and providing complete information on an issue[2].

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace
[2] https://issues.asterisk.org/jira

-- 
Joshua C. Colp
Digium - A Sangoma Company | Senior Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org



More information about the asterisk-users mailing list