[asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

Ross Beer ross.beer at outlook.com
Wed Mar 2 12:42:18 CST 2016


Hi George,
 
I've just rolled back to 13.7.2 and the following modules are in the latest git repository and not in 13.7.2:
 
    res_pjproject.so
    res_odbc_transaction.so
    res_pjsip_history.so
 
Not sure if any of these would make a difference to the load time?
 
Regards,
 
Ross
 
From: ross.beer at outlook.com
To: asterisk-dev at lists.digium.com
Date: Wed, 2 Mar 2016 18:04:15 +0000
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241




Hi George,
 
I have commented out those lines and it hasn't improved the load times, its still taking 15 mins. It has improved it a little.
 
Regards,
 
Ross
 
From: george.joseph at fairview5.com
Date: Wed, 2 Mar 2016 08:19:01 -0700
To: asterisk-dev at lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Wed, Mar 2, 2016 at 2:56 AM, Ross Beer <ross.beer at outlook.com> wrote:



Hi George,
 
I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins to load.
 
As requested I have opened a ticket for the realtime issue:
 
https://issues.asterisk.org/jira/browse/ASTERISK-25826
​Got it, thanks.​ 
 
Basically, I think this could be resolved by a configuration option that stops sourcery/pjsip loading all peers at start-up as this is not needed for the current setup. This has been discussed before on the mailing list however it doesn't look like it progresses any further.

​If you're up for trying something, ​you can comment out the qualify_and_schedule_all function ​in ​line​s​ 1135​-1147​ of res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 1245 and 1281.  If that drops your startup times, then we know we're on the right track.
 
 
I would like to thank you for all of your help tying to identify the issue and hope that we can resolve it soon.

​No worries!​ 
 
Kind regards,
 
Ross
 
From: george.joseph at fairview5.com
Date: Tue, 1 Mar 2016 16:27:06 -0700
To: asterisk-dev at lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer <ross.beer at outlook.com> wrote:



ok,
 
That took 15 mins to load and then crashed. This will be due to the pjsip_dlg_create_uas_and_inc_lock commit.
​It should not have crashed.  That commit had the fix for it.  If it did crash with that commit, open a Jira issue and ​attach a full backtrace. 
 
However 15 mins to start is a long time and would cause issues in a production environment.
​Would you open a Jira issue on the realtime problem (if one isn't already open).I'm starting to look at alternatives.


 
Thank you for your help here,
 
Ross

 
From: george.joseph at fairview5.com
Date: Tue, 1 Mar 2016 14:02:38 -0700
To: asterisk-dev at lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer <ross.beer at outlook.com> wrote:



Hi George,
 
Using a development test box for testing!!
 
Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240

​Ok, try this combination..."git checkout c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching.
The commit I referenced is the one that handles the pjsip_dlg_create_uas_and_inc_lock​


  
Qualify time on the aor is set to zero, I guess a query could be made to check for a value greater than zero instead of loading all endpoints.
 
Ross
 
From: george.joseph at fairview5.com
Date: Tue, 1 Mar 2016 12:45:28 -0700
To: asterisk-dev at lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer <ross.beer at outlook.com> wrote:



Hi George,
 
No endpoints are qualified, there are 20,000 endpoints with only 75 static contacts defined in the aors. The database is a MySQL cluster.
 
With the current Asterisk 13 branch with cache disabled and the latest PJSIP it takes 5 mins and then before finishing it crashes.
 
With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices.

​Try 13.7.2 without the cache.  I'm trying to understand where the time is being spent.​  I know it will crash because of that bug.  You're not doing this on a production system are you??  
The main issue here is that the endpoints are loaded as soon as PJSIP loads, ideally endpoints would only be loaded once a device registers or attempts to make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime.
 
There is no need to load the endpoints as they are not qualified.

​How do you know they're not qualified if you don't load them? :)
Time to load up a database with 20,000 endpoints I guess.​  
Ross
 
From: george.joseph at fairview5.com
Date: Tue, 1 Mar 2016 11:58:15 -0700
To: asterisk-dev at lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy <mulitskiy at acedsl.com> wrote:


Hello,
 
Please see this discussion http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html
I guess you're talking about the same problem.
​It's possible.​
 

 
Michael
 
On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote:
> Hi George,
>  
> We need to store contacts in realtime for our system. However not all endpoints are registered only about 200, yet asterisk loops through every endpoint which has been defined. It does this if contacts are in realtime or not.
>  
> Its almost like pjsip is loading them to check if they need to be qualified etc.
>  
> Asterisk 1.8 only put things into cache once they were accessed, is this an option for sourcery?
​Well, in order to initiate qualify of contacts, Asterisk does have to "access" them all​ so I'm not quite sure what the problem is.
Can we reset to a known config and see what happens?

pjproject from the published 2.4.5 tarball.Asterisk from the published 13.7.2 tarball.Disable memory_cache altogether in sorcery.conf.

See what happens.
Give me an estimate of how many endpoints and aors there are in the database, how many of those aors have static contacts defined, and what's your qualify interval.
An idea of your database setup would help as well.  Same server, local, remote, etc.
Let's solve 1 problem at a time.
 




-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev 		 	   		  

--

_____________________________________________________________________

-- Bandwidth and Colocation Provided by http://www.api-digital.com --



asterisk-dev mailing list

To UNSUBSCRIBE or update options visit:

   http://lists.digium.com/mailman/listinfo/asterisk-dev



-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev 		 	   		  

--

_____________________________________________________________________

-- Bandwidth and Colocation Provided by http://www.api-digital.com --



asterisk-dev mailing list

To UNSUBSCRIBE or update options visit:

   http://lists.digium.com/mailman/listinfo/asterisk-dev



-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev 		 	   		  

--

_____________________________________________________________________

-- Bandwidth and Colocation Provided by http://www.api-digital.com --



asterisk-dev mailing list

To UNSUBSCRIBE or update options visit:

   http://lists.digium.com/mailman/listinfo/asterisk-dev



-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev 		 	   		  

--

_____________________________________________________________________

-- Bandwidth and Colocation Provided by http://www.api-digital.com --



asterisk-dev mailing list

To UNSUBSCRIBE or update options visit:

   http://lists.digium.com/mailman/listinfo/asterisk-dev



-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev 		 	   		  

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20160302/c361cc20/attachment-0001.html>


More information about the asterisk-dev mailing list