[asterisk-bugs] [JIRA] (ASTERISK-27285) AstDB Locks taking a while to unlock

Zach R (JIRA) noreply at issues.asterisk.org
Tue Dec 12 16:45:07 CST 2017


    [ https://issues.asterisk.org/jira/browse/ASTERISK-27285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=240544#comment-240544 ] 

Zach R commented on ASTERISK-27285:
-----------------------------------

I'm attaching the output of core show locks when better stack trace and debug threads are enabled. Though I found when these are enabled I am not seeing the same times from the timing code I previously used to output the lock times. Without the two flags I did see some earlier along with the task processor for contacts slowly reaching its high water limits.

For these tests teh astdb is written to a ramdisk rather than the SSDs this server is running on. The server itself is not a VM, but a physical dedicated machine, and I don't think running a different OS to test is an option at the moment.

The only change in configuration I have at the moment is adding caching for aors, endpoints and auths.

> AstDB Locks taking a while to unlock
> ------------------------------------
>
>                 Key: ASTERISK-27285
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27285
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/AstDB
>    Affects Versions: 13.10.0
>         Environment: Centos 6.9 Final 64 bit, using chan_pjsip with realtime for AORs, Endpoints, Auths, Global. Contacts are in the ASTDB
>            Reporter: Zach R
>            Assignee: Unassigned
>            Severity: Minor
>         Attachments: astdb_stuff.tar.gz
>
>
> I've noticed when I use the AstDB for chan_pjsip's contacts for a large amount of registrations per second the sorcery/contacts task processor will eventually overflow and lock up. I first traced what that task processor does in the code and saw it handles callbacks for deleting and creating the contacts.
> I then tested to see if it was teh AstDB write time that was causing it to back up, as I read that asterisk and/or sqlite3 lock the file/table to avoid corruption or any race conditions for the sqlite3 database. While using realtime on res_odbc I did not have any issues even when sending 40 registrations/calls per second using sipp. This doesn't count the response to the 401 with the nonce.
> After that I added some code using the clock function from C to time how long it was taking in main/db.c for db_puts. I put one set of timestamps/clock objects being set surrounding (on the outside) of the lock done and one on the inside.
> I noticed that it seemed it waited quite a while for the lock to be released sometimes, while the inside of the lock took at most 0.01 seconds.
> I've attached the scenario file I used for sipp, the logs I printed out in console with the timing, as well as the code I changed to do the time logs.
> Some of these took as long as 0.5 seconds around the lock but 0.0 inside it. 
> Ex: 
> {noformat}
> [Sep 21 17:06:39] NOTICE[11520]: db.c:345 ast_db_put: ZACH - (Outside Lock) Total ASTDB Time: 0.690000 (Inside Lock) Total ASTDB Time: 0.000000
> {noformat}
> An ex of the cmd I used for sipp and the scenario to send the registrations to the server:
> {noformat}
> sipp $ASTERISK_SERVER_IP -sf REGISTER_client.xml -inf REGISTER_client.csv -trace_msg -trace_err -r 40 
> {noformat}
> You can modify -r to be a new rate of calls/sec if and add -m to exit after N successful calls.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list