[asterisk-bugs] [JIRA] (ASTERISK-21385) SIP Channel Lock

Matt Jordan (JIRA) noreply at issues.asterisk.org
Thu Apr 11 09:58:02 CDT 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205212#comment-205212 ] 

Matt Jordan commented on ASTERISK-21385:
----------------------------------------

Ross:

I've taken a look at the various backtraces you've provided. While the symptoms may feel like a deadlock, none of the backtraces actually indicate a deadlock situation.

When a deadlock occurs, you will have some form of circular waiting. Take two threads - 0x01 and 0x02. A 'core show locks' output that indicates a deadlock will look something like this:

{noformat}
=== Thread ID: 0x01 (pbx_thread           started at [ 5597] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 4880 ast_write chan 0x7f883c4eb700 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4d6eae]
/usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xb6) [0x4d2636]
/lib64/libpthread.so.0(+0x7851) [0x7f88c5b73851]
/lib64/libc.so.6(clone+0x6d) [0x7f88c686190d]
=== ---> Waiting for Lock #1 (chan_sip.c): MUTEX 12978 cb_extensionstate p 0xac477790 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x8105607]
/usr/sbin/asterisk() [0x8080295]
/usr/sbin/asterisk(_ao2_lock+0x48) [0x8080e3a]
/lib/libpthread.so.0(+0x5afe) [0xb7477afe]
/lib/libc.so.6(clone+0x5e) [0xb76b564e]
=== — ---> Locked Here: chan_sip.c line 7514 (find_call)

=== Thread ID: 0x02 (pbx_thread           started at [ 5597] pbx.c ast_pbx_start())
=== ---> Lock #0 (chan_sip.c): MUTEX 4880 sip_dosomething p 0xac477790 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4d6eae]
/usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xb6) [0x4d2636]
/lib64/libpthread.so.0(+0x7851) [0x7f88c5b73851]
/lib64/libc.so.6(clone+0x6d) [0x7f88c686190d]
=== ---> Waiting for Lock #1 (channel.c): MUTEX 12978 chan_do_stuffs c 0x7f883c4eb700 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x8105607]
/usr/sbin/asterisk() [0x8080295]
/usr/sbin/asterisk(_ao2_lock+0x48) [0x8080e3a]
/lib/libpthread.so.0(+0x5afe) [0xb7477afe]
/lib/libc.so.6(clone+0x5e) [0xb76b564e]
=== — ---> Locked Here: channel.c line 12397 (ast_write)
{noformat}

In this contrived example, thread 0x01 has locked a channel (mutex 0x7f883c4eb700) and is attempting to lock a SIP private data structure (mutex 0xac477790). At the same time, thread 0x02 has locked the SIP private data structure (mutex 0xac477790) and is attempting to lock the channel owned by thread 0x01 (mutex 0x7f883c4eb700). Neither can win, and so both threads are stuck. Deadlock!

In your backtraces, there is no indication of this condition. Rather, the 'core show locks' output merely shows that some locks are held. In many of them, the locks that are held are related to the ODBC connection, which isn't surprising - a database connection can take some time to complete, and there can be lots of contention for that lock. All the output verifies is that there are threads that hold locks - not that any of them are blocked indefinitely.

So why is the system apparently 'stalling'?

{{DEBUG_THREADS}} is a performance killer. I can't overstate that. The {{DEBUG_THREADS}} compile time option essentially turns every lock into a single global mutex. This means that every time any one thread grabs a lock, all mutexes are blocked for a period of time. This practically single threads the entire system.

In the past, {{DEBUG_THREADS}} was not as aggressive as it is in 1.8+. It was made more aggressive because in prior versions of Asterisk, the data it gave back was inaccurate - which was less than useful when debugging deadlocks. As a result, the setting *will* help provide information regarding a deadlock, but it will kill a production system (and in some situations, is simply too aggressive for production systems).

If you compile Asterisk without {{DEBUG_THREADS}} and re-run some of your tests, do you encounter the same situation at the same level of load?
                
> SIP Channel Lock
> ----------------
>
>                 Key: ASTERISK-21385
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21385
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/General
>    Affects Versions: 1.8.21.0, 11.4.0
>         Environment: Cent OS
>            Reporter: Ross Beer
>         Attachments: Asterisk 10.12.2 Lock at 10CPS 560 Concurrent Calls.txt, Asterisk 11.2.0 Lock at 10CPS 50 Concurrent Calls.txt, Asterisk 11.3.0 Lock at 10CPS 50 Concurrent Calls.txt, Asterisk 1.8.19.0 Lock at 10CPS 460 Concurrent Calls.txt, Asterisk 1.8.21.0 Lock at 10CPS 360 Concurrnet Calls.txt, ThreadLock2.rtf, ThreadLocks.txt
>
>
> When above 10 calls per second are placed there is a thread lock which means that the SIP channel stops processing calls. There is a fix for the general SIP channel in 11.4 however this does not appear to resolve Realtime ODBC issues.
> RealTime peers only seam to use a single connection to the ODBC database even when configured to use more:
> ODBC DSN Settings
> -----------------
>   Name:   mysql
>   DSN:    MySql
>     Last connection attempt: 1970-01-01 01:00:00
>   Pooled: Yes
>   Limit:  100
>   Connections in use: 3
>     - Connection 1: in use (res_config_odbc.c:180 realtime_odbc)
>     - Connection 2: connected (:0 )
>     - Connection 3: connected (:0 )
> The RealTime ODBC process should allow as many connections as possible else the backlog causes the SIP channel to stop responding.
> I believe the SIP channel is threaded and therefore should be able to use more connections as needed.
> This is a major issue as SIP registrations stop when there are 100's of devices registered and calls are being made via the SIP channel.
> On each outbound call, the database is access to lookup the peer making the call. The peer is not stored in cache for any period of time and therefore this also increases the need for accessing the database.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list