[asterisk-dev] Lockups in Asterisk 11

Mark Murawski markm-lists at intellasoft.net
Sat Nov 14 09:11:09 CST 2015


Hi Jaco,

I do know what to look for with deadlocks, and the odd thing is that 
what I pasted in the earlier post is not your run of the mill mutex 
based deadlock.  Four locks are currently held and something inside each 
thread is just not finishing.  None of the threads are waiting for 
cross-thread locks.

We have netlock, a single sip private lock, an ao2 lock, and a dblock. 
None of these threads are waiting for additional locks.  It seems 
whatever they are doing is simply not finishing.



On 11/13/15 05:59, Jaco Kroon wrote:
> Hi Mark,
>
> You're right.  That looks like something else.  do_monitor() relates to
> MWI.  This could be a completely different issue, if I need to guess I'd
> say we're looking at a lock-ordering deadlock.  Eg:
>
> Thread 1 grabs lock A.
> Thread 2 grabs lock B.
> Thread 1 blocks trying to lock B.
> Thread 2 blocks trying to lock A.
>
> I'm not seeing any direct changes in chan_sip from 11.19 => 11.20 that
> would do anything with respect to netlock.  Also no changes with respect
> to dblock (and ast_db_put is so trivial that there is almost no way that
> it can be wrong unless the unlock call fails - which I think is impossible).
>
> Out of my depth.  Sorry.
>
> Kind Regards,
> Jaco
>
> On 12/11/2015 07:07, markm-lists at intellasoft.net wrote:
>> Hi Jaco,
>>
>> I've tried this tmpfs workaround on asterisk 11 and I'm running into
>> the following (very very long duration) locks that block sip activity
>> completely.  My core issue is definitely not disk-io.
>>
>>
>> === <pending> <lock#> (<file>): <lock type> <line num> <function>
>> <lock name> <lock addr> (times locked)
>> ===
>> === Thread ID: 0xb524fb70 (do_monitor           started at [29517]
>> chan_sip.c restart_monitor())
>> === ---> Lock #0 (chan_sip.c): MUTEX 28923 handle_request_do &netlock
>> 0xb630b440 (1)
>>          main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)
>>          main/lock.c:258 __ast_pthread_mutex_lock() (0x813864f+85)
>>          channels/chan_sip.c:28926 handle_request_do()
>>          channels/chan_sip.c:28885 sipsock_read()
>>          main/io.c:292 ast_io_wait() (0x8132d50+175)
>>          channels/chan_sip.c:29484 do_monitor()
>>          main/utils.c:1223 dummy_start()
>>          :0 start_thread()
>>          libc.so.6 clone() (0xb77270d0+5E)
>> === ---> Lock #1 (chan_sip.c): MUTEX 8959 sip_pvt_lock_full pvt
>> 0x84b5660 (1)
>>          main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)
>>          main/lock.c:258 __ast_pthread_mutex_lock() (0x813864f+85)
>>          main/astobj2.c:198 __ao2_lock() (0x80906ad+7C)
>>          channels/chan_sip.c:8960 sip_pvt_lock_full()
>>          channels/chan_sip.c:28939 handle_request_do()
>>          channels/chan_sip.c:28885 sipsock_read()
>>          main/io.c:292 ast_io_wait() (0x8132d50+175)
>>          channels/chan_sip.c:29484 do_monitor()
>>          main/utils.c:1223 dummy_start()
>>          :0 start_thread()
>>          libc.so.6 clone() (0xb77270d0+5E)
>> === ---> Lock #2 (chan_sip.c): MUTEX 17047 register_verify peer
>> 0x86df2b8 (1)
>>          main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)
>>          main/lock.c:258 __ast_pthread_mutex_lock() (0x813864f+85)
>>          main/astobj2.c:198 __ao2_lock() (0x80906ad+7C)
>>          channels/chan_sip.c:17048 register_verify()
>>          channels/chan_sip.c:28477 handle_request_register()
>>          channels/chan_sip.c:28785 handle_incoming()
>>          channels/chan_sip.c:28953 handle_request_do()
>>          channels/chan_sip.c:28885 sipsock_read()
>>          main/io.c:292 ast_io_wait() (0x8132d50+175)
>>          channels/chan_sip.c:29484 do_monitor()
>>          main/utils.c:1223 dummy_start()
>>          :0 start_thread()
>>          libc.so.6 clone() (0xb77270d0+5E)
>> === ---> Lock #3 (db.c): MUTEX 329 ast_db_put &dblock 0x8214f60 (1)
>>          main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)
>>          main/lock.c:380 __ast_pthread_mutex_trylock() (0x81389b7+85)
>>          main/db.c:329 ast_db_put() (0x80eeec4+124)
>>          channels/chan_sip.c:16242 parse_register_contact()
>>          channels/chan_sip.c:17069 register_verify()
>>          channels/chan_sip.c:28477 handle_request_register()
>>          channels/chan_sip.c:28785 handle_incoming()
>>          channels/chan_sip.c:28953 handle_request_do()
>>          channels/chan_sip.c:28885 sipsock_read()
>>          main/io.c:292 ast_io_wait() (0x8132d50+175)
>>          channels/chan_sip.c:29484 do_monitor()
>>          main/utils.c:1223 dummy_start()
>>          :0 start_thread()
>>          libc.so.6 clone() (0xb77270d0+5E)
>> === -------------------------------------------------------------------
>> ===
>> =======================================================================
>>
>>
>> On 11.11.2015 10:03, Jaco Kroon wrote:
>>> Hi Mark,
>>>
>>> I suspect the following relates:
>>>
>>>
>>> http://jkroon.blogs.uls.co.za/it/voip/asterisk-massively-speeding-up-those-register-requests
>>>
>>>
>>> That should explain the underlying problem for you, and potentially
>>> provide you with a fix depending on your situation.  We've been using
>>> the proposed "workaround" there since the time of writing with no ill
>>> effect, however, you need to take heed of the warnings posted there.
>>>
>>> We had to modify code on at least some instances to avoid DB() dialplan
>>> use, in our case that wasn't difficult, but still, you need to be aware
>>> of the risks.
>>>
>>> Kind Regards,
>>> Jaco
>>>
>>
>
>
>




More information about the asterisk-dev mailing list