[asterisk-dev] PJSIP realtime scalability problem

Michael Ulitskiy mulitskiy at acedsl.com
Sat Oct 17 19:12:26 CDT 2015


Actually I have to apologize.
Initial load of endpoints does populate the cache. Not sure what I was looking at, but I was just checking 
it now multiple times and it works. I'm sorry for the misinformation.
The rest of the problem stands though.
Thanks,

Michael

On Saturday, October 17, 2015 01:54:44 PM Michael Ulitskiy wrote:
> Matthew,
>  
> First of all, I apologize if my tone sounded too harsh. I didn't mean to offend anyone.
> I didn't mean to just say "it sucks". I wish to point it out though that, again, unless I'm missing something,
> current behaviour of pjsip realtime is not scalable and I believe it's a departure from
> what has been known as "dynamic realtime" for a long time.
> Please see inline for the answers to particular questions.
> Thanks,
> Michael
> 
> On Saturday, October 17, 2015 10:49:32 AM Matthew Jordan wrote:
> > On Sat, Oct 17, 2015 at 10:03 AM, Michael Ulitskiy <mulitskiy at acedsl.com> wrote:
> > > Matthew,
> > >
> > >
> > >
> > > Thanks for the reply.
> > >
> > > Yes I do have caching enabled. While caching does somewhat help (there are
> > > different problems there)
> > 
> > Which problems?
> 
> The problem here isn't actually related to caching implementation, but to the way pjsip matches endpoints.
> Whenever sip request arrives pjsip initially performs lookup for 'username at domain' and if it fails it falls
> back to lookup by username only.
>  
> It results in 2 queries:
> SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1 at domain';
> SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1';
>  
> Now in my environment only the 2nd one will succeed and will be cached. Now for every sip request my asterisk
> will be issuing
>  
> SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1 at domain';
>  
> that will never succeed followed by retrieving 'ep1' from cache.
> Basically I'd like to have a way to suppress lookup for 'username at domain' or at least to cache the negative results.
> 
> > 
> > > with ongoing load it has nothing to do with initial load that is still done
> > > in the extremely inefficient way
> > >
> > > I described in my original email.
> > 
> > I'm not sure why that would be the case. You'll need to be more
> > specific, and provide your sorcery.conf configuration as well as the
> > specific operations/times when there are issues.
> 
> sorcery.conf:
> [res_pjsip]
> endpoint=config,pjsip.conf,criteria=type=endpoint
> endpoint/cache=memory_cache,expire_on_reload=yes,object_lifetime_maximum=600,object_lifetime_stale=300
> endpoint=realtime,ps_endpoints
> aor=config,pjsip.conf,criteria=type=aor
> aor/cache=memory_cache,expire_on_reload=yes,object_lifetime_maximum=600,object_lifetime_stale=300
> aor=realtime,ps_aors
>  
> extconfig.conf:
> ps_endpoints => pgsql,users,pjsip_endpoints_v
> ps_aors => pgsql,users,pjsip_aors_v
>  
> When asterisk starts up and loads pjsip it does the following:
> SELECT * FROM pjsip_aors_v WHERE id LIKE '%' ORDER BY id
> SELECT * FROM pjsip_endpoints_v WHERE id LIKE '%' ORDER BY id
> thus loading all endpoints and AORs in memory. Then the worst part, it follows on with loading all
> endpoints and AORs individually with queries like this:
> SELECT * FROM pjsip_aors_v WHERE id = 'ep1'
> SELECT * FROM pjsip_aors_v WHERE id = 'ep2'
> ...
> SELECT * FROM pjsip_aors_v WHERE id = 'epN'
>  
> then
>  
> SELECT * FROM pjsip_endpoints_v WHERE id = 'ep1'
> SELECT * FROM pjsip_endpoints_v WHERE id = 'ep2'
> ...
> SELECT * FROM pjsip_endpoints_v WHERE id = 'epN'
>  
> With 10K endpoints it results in 20K queries to db at asterisk startup. Now imagine multiple asterisk
> servers. This is the biggest problem.
>  
> Also, to my surprise, this initial loading doesn't populate cache.
> Right after asterisk startup I do "sorcery memory cache dump res_pjsip/endpoint" and it's empty therefore causing
> additional db lookups as asterisk starts to serve sip requests.
> 
> > > Caching also doesn't help at all with CLI commands like "pjsip show
> > > endpoints" in which case asterisk
> > >
> > > reloads the whole list from db instead of showing what it has in-memory.
> > 
> > That actually is by design.
> > 
> > Say we are caching endpoints. The cache only contains the n most
> > recently requested endpoints, *not* every endpoint that you may have
> > in your system. Hence, if you ask for all endpoints, we have to bypass
> > the cache and get all endpoints in order to accurately fulfill the
> > request.
> > 
> > Given that this is a human interaction and not a run-time machine
> > interaction, the fact that you're requesting all endpoints results in
> > going out to the database is not unreasonable.
> 
> Well I see your point. The thing is that in a system where endpoints are
> dynamically spread over multiple asterisk systems I never want to see
> all the endpoints. Only those that's been served by this asterisk and cached.
> May be it's worth having a command that shows only cached endpoints?
> Basically I was happy with how chan_sip worked in that regard - only loading
> endpoints on-demand and only showing those endpoints that are loaded in memory.
> 
> > > Also I've noticed another very awkward problem. If I type "pjsip show
> > > endpoint" in the console and then
> > >
> > > press "Tab" then asterisk hangs for over a minute and I register over 300
> > > queries like this in the db log:
> > 
> > So, first, you are asking for name completion against 10k endpoints.
> > Regardless of the number of database queries, that's a large set to
> > complete against. Granted, there's no reason to go get the dataset on
> > every single entry...
> > 
> > >
> > >
> > > SELECT * FROM pjsip_endpoints_v WHERE id LIKE '%' ORDER BY id
> > >
> > 
> > ... which does appear as if that is what we are doing. In pjsip_cli:
> > 
> >     while ((object = ao2_t_iterator_next(&i, "iterate thru endpoints table"))) {
> >         const char *id = formatter_entry->get_id(object);
> >         if (!strncasecmp(word, id, wordlen)
> >             && ++which > state) {
> >             result = ast_strdup(id);
> >         }
> >         ao2_t_ref(object, -1, "toss iterator endpoint ptr before break");
> >         if (result) {
> >             break;
> >         }
> >     }
> > 
> > Since the endpoint formatter_entry only has a 'get by id' callback:
> > 
> > static void *cli_endpoint_retrieve_by_id(const char *id)
> > {
> >     return ast_sorcery_retrieve_by_id(ast_sip_get_sorcery(), "endpoint", id);
> > }
> > 
> > That means that for every partial match that you have on an endpoint,
> > we do a separate lookup.
> > 
> > Alternatively, we could go pull a partial match in a single query,
> > than iterate over the returned set of matches. Clearly that would be a
> > lot better in this case.
> > 
> > >
> > > Why would asterisk need to load the whole list of endpoints more than 300
> > > times is just completely beyond me.
> > >
> > 
> > Hyperbole aside, it's because PJSIP chose a sane, maintainable method
> > to interact with its storage backends and uses a data abstraction
> > layer above its SQL statements - unlike chan_sip, which just embeds
> > the statements willy-nilly in the codebase. The downside of this is
> > that sometimes - in some specific cases - we aren't as efficient as we
> > should be.
> > 
> > That's fixable however. Please do file a specific issue for the tab
> > completion case, as that should be improved.
> 
> First of all, again, I'd prefer that completion to be performed not against all the endpoints
> in db, but only those loaded and cached.
> Second, my test environment doesn't have 10K of endpoints, but only currently 173.
> I imagine that if I did it against all 10K endpoints it would never finish.
> Sure I'll open an issue for that.
> 
> > >
> > > For a long time it was my understanding that "dynamic realtime" means
> > > loading data from db on demand.
> > >
> > > What pjsip does now is not a dynamic realtime. What it does seems like the
> > > mix of both worlds: static realtime in the beginning -
> > >
> > > loading everything from db and dynamic afterwords - issuing queries whenever
> > > it needs endpoint data (caching helps here).
> > >
> > >
> > >
> > > Unless I'm missing something and there's a another/better way to use it, I
> > > think pjsip realtime is not usable now
> > >
> > > at any scale other than very trivial one.
> > >
> > 
> > Please leave hyperbole at the door. If you'd like help narrowing down
> > the specific cases that are causing issues, that'd be great. We'd love
> > to help. "I think this sucks" isn't helpful.
> > 
> > Right now, you've pointed out one specific case that clearly needs
> > improvement. Please provide specific evidence for each case that
> > you're running into, when caching is enabled, where a run-time
> > operation is substantially less efficient than it should be.
> > 
> > And remember: this is an open source project. If you'd like to help
> > fix things, that's always appreciated.
> > 
> > Matt
> > 
> > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20151017/6ef4c885/attachment-0001.html>


More information about the asterisk-dev mailing list