[asterisk-bugs] [JIRA] (ASTERISK-27983) pjsip_options: rework may have left concurrency issue

Gregory Massel (JIRA) noreply at issues.asterisk.org
Tue Jul 24 01:56:54 CDT 2018


    [ https://issues.asterisk.org/jira/browse/ASTERISK-27983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=244210#comment-244210 ] 

Gregory Massel commented on ASTERISK-27983:
-------------------------------------------

My apologies!!!

Since the previous post I have moved astdb.sqlite3 onto a tmpfs file system and have more then tripled the registrations per second.
This must have been an I/O issue and the contention was either on writing the contact to astdb.sqlite3 or reading it from astdb.sqlite3.
If that is the case, then perhaps there may be a way to re-word the log entry written such that it actually makes reference to AstDB? This would make debugging a lot simpler!

I've re-benchmarked and I agree that res_pjsip is now performing registrations faster than chan_sip, but only *slightly*. Interestingly, however, in both cases, the bottleneck is sqlite3, and, as a result, the difference between the two drivers in respect of registration performance isn't huge. 

Given that registrations are, by their very nature transient, enormous performance gains could be achieved if one had the option to keep the registrations in memory (i.e. not rely on a sqlite3 database). The performance gain would justify the loss of registrations on startup/shutdown/crash. Also, it may be possible to merely dump the entire memory cache to AstDB on shutdown and pull it from AstDB on startup. Similarly, it may be possible to dump the memory cache to AstDB every few minutes in a separate thread (that can run in parallel to registrations, rather then holding them up).

> pjsip_options: rework may have left concurrency issue
> -----------------------------------------------------
>
>                 Key: ASTERISK-27983
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27983
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_pjsip
>    Affects Versions: 13.22.0
>         Environment: Ubuntu, Asterisk 13.22.0, pjsip, thousands of endpoints
>            Reporter: Gregory Massel
>            Assignee: Gregory Massel
>            Severity: Minor
>              Labels: pjsip
>
> Following the excellent work done in terms of ASTERISK-26806, I have been doing some performance testing on Asterisk 13.22.0 with res_pjsip.
> If I get sipp to run more than 200 registrations per second, it starts logging:
> res_pjsip/pjsip_distributor.c: Request 'REGISTER' from '"[redacted]" <sip:[redacted]>' failed for '[redacted]:5060' (callid: [redacted]) - No matching endpoint found after X tries in 0.000 ms
> Where X is between 5 and 9 and >99% of the log entries show 0.000 ms.
> At first I thought the time was logging incorrectly, but there are a few up to 2.731ms.
> It is odd, however, that the endpoint cannot be found after numerous tries in such a short period of time (almost always less than 0.001ms). All the endpoints are valid and, if I reduce the registration rate to 100 reg per second, no such errors log.
> Throughout this, CPU usage remains extremely modest across all cores.
> This leads me to believe that there may be some sort of locking or contention across the endpoints/aors/contacts tables that is causing the registration performance to be restricted in spite of the hardware.
> It seems that, despite significant gains in performance since ASTERISK-26806 was resolved, pjsip is still performing slower than chan_sip in terms of registrations (despite chan_sip using only a single core and res_pjsip threading across 8 cores).



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list