[asterisk-dev] PJSIP realtime scalability problem

Matthew Jordan mjordan at digium.com
Sat Oct 17 10:49:32 CDT 2015


On Sat, Oct 17, 2015 at 10:03 AM, Michael Ulitskiy <mulitskiy at acedsl.com> wrote:
> Matthew,
>
>
>
> Thanks for the reply.
>
> Yes I do have caching enabled. While caching does somewhat help (there are
> different problems there)

Which problems?

> with ongoing load it has nothing to do with initial load that is still done
> in the extremely inefficient way
>
> I described in my original email.

I'm not sure why that would be the case. You'll need to be more
specific, and provide your sorcery.conf configuration as well as the
specific operations/times when there are issues.

> Caching also doesn't help at all with CLI commands like "pjsip show
> endpoints" in which case asterisk
>
> reloads the whole list from db instead of showing what it has in-memory.

That actually is by design.

Say we are caching endpoints. The cache only contains the n most
recently requested endpoints, *not* every endpoint that you may have
in your system. Hence, if you ask for all endpoints, we have to bypass
the cache and get all endpoints in order to accurately fulfill the
request.

Given that this is a human interaction and not a run-time machine
interaction, the fact that you're requesting all endpoints results in
going out to the database is not unreasonable.

> Also I've noticed another very awkward problem. If I type "pjsip show
> endpoint" in the console and then
>
> press "Tab" then asterisk hangs for over a minute and I register over 300
> queries like this in the db log:

So, first, you are asking for name completion against 10k endpoints.
Regardless of the number of database queries, that's a large set to
complete against. Granted, there's no reason to go get the dataset on
every single entry...

>
>
> SELECT * FROM pjsip_endpoints_v WHERE id LIKE '%' ORDER BY id
>

... which does appear as if that is what we are doing. In pjsip_cli:

    while ((object = ao2_t_iterator_next(&i, "iterate thru endpoints table"))) {
        const char *id = formatter_entry->get_id(object);
        if (!strncasecmp(word, id, wordlen)
            && ++which > state) {
            result = ast_strdup(id);
        }
        ao2_t_ref(object, -1, "toss iterator endpoint ptr before break");
        if (result) {
            break;
        }
    }

Since the endpoint formatter_entry only has a 'get by id' callback:

static void *cli_endpoint_retrieve_by_id(const char *id)
{
    return ast_sorcery_retrieve_by_id(ast_sip_get_sorcery(), "endpoint", id);
}

That means that for every partial match that you have on an endpoint,
we do a separate lookup.

Alternatively, we could go pull a partial match in a single query,
than iterate over the returned set of matches. Clearly that would be a
lot better in this case.

>
> Why would asterisk need to load the whole list of endpoints more than 300
> times is just completely beyond me.
>

Hyperbole aside, it's because PJSIP chose a sane, maintainable method
to interact with its storage backends and uses a data abstraction
layer above its SQL statements - unlike chan_sip, which just embeds
the statements willy-nilly in the codebase. The downside of this is
that sometimes - in some specific cases - we aren't as efficient as we
should be.

That's fixable however. Please do file a specific issue for the tab
completion case, as that should be improved.

>
> For a long time it was my understanding that "dynamic realtime" means
> loading data from db on demand.
>
> What pjsip does now is not a dynamic realtime. What it does seems like the
> mix of both worlds: static realtime in the beginning -
>
> loading everything from db and dynamic afterwords - issuing queries whenever
> it needs endpoint data (caching helps here).
>
>
>
> Unless I'm missing something and there's a another/better way to use it, I
> think pjsip realtime is not usable now
>
> at any scale other than very trivial one.
>

Please leave hyperbole at the door. If you'd like help narrowing down
the specific cases that are causing issues, that'd be great. We'd love
to help. "I think this sucks" isn't helpful.

Right now, you've pointed out one specific case that clearly needs
improvement. Please provide specific evidence for each case that
you're running into, when caching is enabled, where a run-time
operation is substantially less efficient than it should be.

And remember: this is an open source project. If you'd like to help
fix things, that's always appreciated.

Matt



More information about the asterisk-dev mailing list