[asterisk-bugs] [JIRA] (ASTERISK-27285) AstDB Locks taking a while to unlock

Tue Dec 12 16:47:07 CST 2017

     [ https://issues.asterisk.org/jira/browse/ASTERISK-27285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zach R updated ASTERISK-27285:
------------------------------

    Attachment: locks_with_better_stack_trace
                locks_no_better_stack

I've attached both outputs of core show locks while I ran the sipp profile/test at a rate of 60 per second instead of 40 due to configuration changes mentionedi n my last comment.

The file no_better_stack has BETTER_STACKTRACE flag disabled, while the other file has it enabled.

> AstDB Locks taking a while to unlock
> ------------------------------------
>
>                 Key: ASTERISK-27285
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27285
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/AstDB
>    Affects Versions: 13.10.0
>         Environment: Centos 6.9 Final 64 bit, using chan_pjsip with realtime for AORs, Endpoints, Auths, Global. Contacts are in the ASTDB
>            Reporter: Zach R
>            Assignee: Unassigned
>            Severity: Minor
>         Attachments: astdb_stuff.tar.gz, locks_no_better_stack, locks_with_better_stack_trace
>
>
> I've noticed when I use the AstDB for chan_pjsip's contacts for a large amount of registrations per second the sorcery/contacts task processor will eventually overflow and lock up. I first traced what that task processor does in the code and saw it handles callbacks for deleting and creating the contacts.
> I then tested to see if it was teh AstDB write time that was causing it to back up, as I read that asterisk and/or sqlite3 lock the file/table to avoid corruption or any race conditions for the sqlite3 database. While using realtime on res_odbc I did not have any issues even when sending 40 registrations/calls per second using sipp. This doesn't count the response to the 401 with the nonce.
> After that I added some code using the clock function from C to time how long it was taking in main/db.c for db_puts. I put one set of timestamps/clock objects being set surrounding (on the outside) of the lock done and one on the inside.
> I noticed that it seemed it waited quite a while for the lock to be released sometimes, while the inside of the lock took at most 0.01 seconds.
> I've attached the scenario file I used for sipp, the logs I printed out in console with the timing, as well as the code I changed to do the time logs.
> Some of these took as long as 0.5 seconds around the lock but 0.0 inside it. 
> Ex: 
> {noformat}
> [Sep 21 17:06:39] NOTICE[11520]: db.c:345 ast_db_put: ZACH - (Outside Lock) Total ASTDB Time: 0.690000 (Inside Lock) Total ASTDB Time: 0.000000
> {noformat}
> An ex of the cmd I used for sipp and the scenario to send the registrations to the server:
> {noformat}
> sipp $ASTERISK_SERVER_IP -sf REGISTER_client.xml -inf REGISTER_client.csv -trace_msg -trace_err -r 40 
> {noformat}
> You can modify -r to be a new rate of calls/sec if and add -m to exit after N successful calls.

--
This message was sent by Atlassian JIRA
(v6.2#6252)