[asterisk-bugs] [JIRA] (ASTERISK-27983) pjsip_options: rework may have left concurrency issue

Gregory Massel (JIRA) noreply at issues.asterisk.org
Tue Jul 24 05:41:54 CDT 2018


    [ https://issues.asterisk.org/jira/browse/ASTERISK-27983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=244213#comment-244213 ] 

Gregory Massel commented on ASTERISK-27983:
-------------------------------------------

I am using configurations similar to those at https://github.com/flaviogoncalves/astricon2017 (but with my own usernames/passwords) and then running:

/usr/local/bin/sipp x.x.x.x -sf register_options.xml -i x.x.x.x  -d 1000 -inf passwords.csv -oocsf ooc.xml -r 100

I have ~6800 users in a pjsip.conf file and default sorcery settings (i.e. manual config files).
This registers all the users and deals with OPTIONS packets (from qualify=yes).

I then just increase and decrease the rate in real time using + and - in sipp. Once the rate of registrations gets too high, pjsip starts logging "No matching endpoint found after X tries in 0.000 ms". Moving astdb.sqlite3 to tmpfs allows for pjsip to process the registrations much faster (as it's writing to RAM rather than hard drive), however, the same messages start logging irrespective, just at ~100 registrations per second when AstDB is on a hard drive versus at ~300 registrations per second when AstDB is on a memory file system. The higher the rate of registrations (above the achievable maximum), the faster these messages log.

I'm running htop at the same time to verify that the CPU cores are not getting too busy.

sqlite3 is not multi-threaded (although it is thread-safe) and is subject to storage / IO performance. Is it possible that any part of the endpoint identification process (in other threads) could get held up by a lock while a different thread is writing the registration information to AstDB?

> pjsip_options: rework may have left concurrency issue
> -----------------------------------------------------
>
>                 Key: ASTERISK-27983
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27983
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_pjsip
>    Affects Versions: 13.22.0
>         Environment: Ubuntu, Asterisk 13.22.0, pjsip, thousands of endpoints
>            Reporter: Gregory Massel
>            Assignee: Gregory Massel
>            Severity: Minor
>              Labels: pjsip
>
> Following the excellent work done in terms of ASTERISK-26806, I have been doing some performance testing on Asterisk 13.22.0 with res_pjsip.
> If I get sipp to run more than 200 registrations per second, it starts logging:
> res_pjsip/pjsip_distributor.c: Request 'REGISTER' from '"[redacted]" <sip:[redacted]>' failed for '[redacted]:5060' (callid: [redacted]) - No matching endpoint found after X tries in 0.000 ms
> Where X is between 5 and 9 and >99% of the log entries show 0.000 ms.
> At first I thought the time was logging incorrectly, but there are a few up to 2.731ms.
> It is odd, however, that the endpoint cannot be found after numerous tries in such a short period of time (almost always less than 0.001ms). All the endpoints are valid and, if I reduce the registration rate to 100 reg per second, no such errors log.
> Throughout this, CPU usage remains extremely modest across all cores.
> This leads me to believe that there may be some sort of locking or contention across the endpoints/aors/contacts tables that is causing the registration performance to be restricted in spite of the hardware.
> It seems that, despite significant gains in performance since ASTERISK-26806 was resolved, pjsip is still performing slower than chan_sip in terms of registrations (despite chan_sip using only a single core and res_pjsip threading across 8 cores).



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list