[asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

Michael Ulitskiy mulitskiy at acedsl.com
Wed Mar 2 13:51:13 CST 2016


Hello,

Since you started to look at it again, let me repeat myself.
The problem is described in detail here: http://lists.digium.com/pipermail/asterisk-dev/2015-October/075128.html
It has to do with the fact that at initial load pjsip realtime issues separate db query for each endpoint/aor/etc in the system.
In my case of ~10K endpoints it took asterisk ~1.5minutes to load.
Further in that discussion I suggested that having the following API call to populate sorcery cache would go a long way to 
reducing the scale of this problem:

ast_sorcery_retrieve_by_fields(sip_sorcery, "endpoint",AST_RETRIEVE_FLAG_MULTIPLE | AST_RETRIEVE_FLAG_ALL, NULL);

I haven't looked at pjsip since the time of that discussion as that's clearly a show-stopper for me, but I doubt anything changed.
Also I haven't received any feedback if that suggestion is viable, so I'd love to hear your (and/or other developers) opinion on it.
Any other idea on how to deal with it is more than welcome as well.

Thanks,
Michael

On Wednesday, March 02, 2016 06:04:15 PM Ross Beer wrote:
> Hi George,
>  
> I have commented out those lines and it hasn't improved the load times, its still taking 15 mins. It has improved it a little.
>  
> Regards,
>  
> Ross
>  
> From: george.joseph at fairview5.com
> Date: Wed, 2 Mar 2016 08:19:01 -0700
> To: asterisk-dev at lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Wed, Mar 2, 2016 at 2:56 AM, Ross Beer <ross.beer at outlook.com> wrote:
> 
> 
> 
> Hi George,
>  
> I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins to load.
>  
> As requested I have opened a ticket for the realtime issue:
>  
> https://issues.asterisk.org/jira/browse/ASTERISK-25826
> ​Got it, thanks.​ 
>  
> Basically, I think this could be resolved by a configuration option that stops sourcery/pjsip loading all peers at start-up as this is not needed for the current setup. This has been discussed before on the mailing list however it doesn't look like it progresses any further.
> 
> ​If you're up for trying something, ​you can comment out the qualify_and_schedule_all function ​in ​line​s​ 1135​-1147​ of res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 1245 and 1281.  If that drops your startup times, then we know we're on the right track.
>  
>  
> I would like to thank you for all of your help tying to identify the issue and hope that we can resolve it soon.
> 
> ​No worries!​ 
>  
> Kind regards,
>  
> Ross
>  
> From: george.joseph at fairview5.com
> Date: Tue, 1 Mar 2016 16:27:06 -0700
> To: asterisk-dev at lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer <ross.beer at outlook.com> wrote:
> 
> 
> 
> ok,
>  
> That took 15 mins to load and then crashed. This will be due to the pjsip_dlg_create_uas_and_inc_lock commit.
> ​It should not have crashed.  That commit had the fix for it.  If it did crash with that commit, open a Jira issue and ​attach a full backtrace. 
>  
> However 15 mins to start is a long time and would cause issues in a production environment.
> ​Would you open a Jira issue on the realtime problem (if one isn't already open).I'm starting to look at alternatives.
> 
> 
>  
> Thank you for your help here,
>  
> Ross
> 
>  
> From: george.joseph at fairview5.com
> Date: Tue, 1 Mar 2016 14:02:38 -0700
> To: asterisk-dev at lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer <ross.beer at outlook.com> wrote:
> 
> 
> 
> Hi George,
>  
> Using a development test box for testing!!
>  
> Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240
> 
> ​Ok, try this combination..."git checkout c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching.
> The commit I referenced is the one that handles the pjsip_dlg_create_uas_and_inc_lock​
> 
> 
>   
> Qualify time on the aor is set to zero, I guess a query could be made to check for a value greater than zero instead of loading all endpoints.
>  
> Ross
>  
> From: george.joseph at fairview5.com
> Date: Tue, 1 Mar 2016 12:45:28 -0700
> To: asterisk-dev at lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer <ross.beer at outlook.com> wrote:
> 
> 
> 
> Hi George,
>  
> No endpoints are qualified, there are 20,000 endpoints with only 75 static contacts defined in the aors. The database is a MySQL cluster.
>  
> With the current Asterisk 13 branch with cache disabled and the latest PJSIP it takes 5 mins and then before finishing it crashes.
>  
> With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices.
> 
> ​Try 13.7.2 without the cache.  I'm trying to understand where the time is being spent.​  I know it will crash because of that bug.  You're not doing this on a production system are you??  
> The main issue here is that the endpoints are loaded as soon as PJSIP loads, ideally endpoints would only be loaded once a device registers or attempts to make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime.
>  
> There is no need to load the endpoints as they are not qualified.
> 
> ​How do you know they're not qualified if you don't load them? :)
> Time to load up a database with 20,000 endpoints I guess.​  
> Ross
>  
> From: george.joseph at fairview5.com
> Date: Tue, 1 Mar 2016 11:58:15 -0700
> To: asterisk-dev at lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy <mulitskiy at acedsl.com> wrote:
> 
> 
> Hello,
>  
> Please see this discussion http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html
> I guess you're talking about the same problem.
> ​It's possible.​
>  
> 
>  
> Michael
>  
> On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote:
> > Hi George,
> >  
> > We need to store contacts in realtime for our system. However not all endpoints are registered only about 200, yet asterisk loops through every endpoint which has been defined. It does this if contacts are in realtime or not.
> >  
> > Its almost like pjsip is loading them to check if they need to be qualified etc.
> >  
> > Asterisk 1.8 only put things into cache once they were accessed, is this an option for sourcery?
> ​Well, in order to initiate qualify of contacts, Asterisk does have to "access" them all​ so I'm not quite sure what the problem is.
> Can we reset to a known config and see what happens?
> 
> pjproject from the published 2.4.5 tarball.Asterisk from the published 13.7.2 tarball.Disable memory_cache altogether in sorcery.conf.
> 
> See what happens.
> Give me an estimate of how many endpoints and aors there are in the database, how many of those aors have static contacts defined, and what's your qualify interval.
> An idea of your database setup would help as well.  Same server, local, remote, etc.
> Let's solve 1 problem at a time.
>  
> 
> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20160302/b35b7644/attachment-0001.html>


More information about the asterisk-dev mailing list