[asterisk-bugs] [JIRA] (ASTERISK-25455) Deadlock of PJSIP realtime over res_config_pgsql

mdu113 (JIRA) noreply at issues.asterisk.org
Tue Oct 13 11:15:33 CDT 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=227852#comment-227852 ] 

mdu113 commented on ASTERISK-25455:
-----------------------------------

Unfortunately I was too quick to dismiss this issue. It still happens, just less often.
I took some time to analyze the problem myself and I think I found the problem.
I believe a code path exists that attempts to use pgsql connection without locking pgsql_lock. I believe what happens during that deadlock that I see is two concurrent threads are both attempting to send query to pgsql, one of the thread is using a code path without locking pgsql_lock. If they managed to send queries at the same time, it seems postgres ignores one of the queries and replies only to the one of them. If it happens so that the thread holding the lock didn't receive the reply it will wait for it (and hold the lock) forever (or at least for very long time), thus completely blocking all access to db.

I found one such code path. In res_config_pgsql.c find_table() function is issuing querying without acquiring the lock. I've attached a simple patch that fixing it.
I'm not sure if more code paths like this exist, but after applying this patch I haven't seen the problem so far.

> Deadlock of PJSIP realtime over res_config_pgsql 
> -------------------------------------------------
>
>                 Key: ASTERISK-25455
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25455
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_pjsip, Resources/res_config_pgsql, Resources/res_pjsip
>    Affects Versions: 13.5.0, 13.6.0
>         Environment: linux 64bit
> kernel 3.10.17
> distro slackware64 14.1 
>            Reporter: mdu113
>            Assignee: Unassigned
>         Attachments: backtrace-threads2.txt, core-show-locks2.txt, core-show-locks.txt, lock-bt-full.txt, res_config_pgsql.c-connlock.diff
>
>
> Asterisk intermittently deadlocks during initial loading of pjsip endpoints. It seems to depend on number of endpoints loaded. With small number of endpoints (up to a 100) it loads ok most of the time. With several hundreds of endpoints it deadlocks sometimes. With several thousand of endpoints it deadlocks most of the time.
> Attached are backtrace and output of "core show locks". The testing was done on asterisk 13.6.0-rc2



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list