<div dir="ltr"><div>Hi,</div><div><br></div>I believe there is a known issue in Postgres where if multiple threads call pgsql_exec(), then it can cause threads to get stuck inside Postgres - This is why I suggested that Postgres patch, which puts Asterisk locks around the call to pgsql_exec().<div><br></div><div>OTOH, I don't think you mentioned what DB you are using, so that might not apply...</div><div><br></div><div>Regards,</div><div>Steve</div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, 14 Nov 2015 at 15:11 Mark Murawski <<a href="mailto:markm-lists@intellasoft.net">markm-lists@intellasoft.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jaco,<br>
<br>
I do know what to look for with deadlocks, and the odd thing is that<br>
what I pasted in the earlier post is not your run of the mill mutex<br>
based deadlock. Four locks are currently held and something inside each<br>
thread is just not finishing. None of the threads are waiting for<br>
cross-thread locks.<br>
<br>
We have netlock, a single sip private lock, an ao2 lock, and a dblock.<br>
None of these threads are waiting for additional locks. It seems<br>
whatever they are doing is simply not finishing.<br>
<br>
<br>
<br>
On 11/13/15 05:59, Jaco Kroon wrote:<br>
> Hi Mark,<br>
><br>
> You're right. That looks like something else. do_monitor() relates to<br>
> MWI. This could be a completely different issue, if I need to guess I'd<br>
> say we're looking at a lock-ordering deadlock. Eg:<br>
><br>
> Thread 1 grabs lock A.<br>
> Thread 2 grabs lock B.<br>
> Thread 1 blocks trying to lock B.<br>
> Thread 2 blocks trying to lock A.<br>
><br>
> I'm not seeing any direct changes in chan_sip from 11.19 => 11.20 that<br>
> would do anything with respect to netlock. Also no changes with respect<br>
> to dblock (and ast_db_put is so trivial that there is almost no way that<br>
> it can be wrong unless the unlock call fails - which I think is impossible).<br>
><br>
> Out of my depth. Sorry.<br>
><br>
> Kind Regards,<br>
> Jaco<br>
><br>
> On 12/11/2015 07:07, <a href="mailto:markm-lists@intellasoft.net" target="_blank">markm-lists@intellasoft.net</a> wrote:<br>
>> Hi Jaco,<br>
>><br>
>> I've tried this tmpfs workaround on asterisk 11 and I'm running into<br>
>> the following (very very long duration) locks that block sip activity<br>
>> completely. My core issue is definitely not disk-io.<br>
>><br>
>><br>
>> === <pending> <lock#> (<file>): <lock type> <line num> <function><br>
>> <lock name> <lock addr> (times locked)<br>
>> ===<br>
>> === Thread ID: 0xb524fb70 (do_monitor started at [29517]<br>
>> chan_sip.c restart_monitor())<br>
>> === ---> Lock #0 (chan_sip.c): MUTEX 28923 handle_request_do &netlock<br>
>> 0xb630b440 (1)<br>
>> main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)<br>
>> main/lock.c:258 __ast_pthread_mutex_lock() (0x813864f+85)<br>
>> channels/chan_sip.c:28926 handle_request_do()<br>
>> channels/chan_sip.c:28885 sipsock_read()<br>
>> main/io.c:292 ast_io_wait() (0x8132d50+175)<br>
>> channels/chan_sip.c:29484 do_monitor()<br>
>> main/utils.c:1223 dummy_start()<br>
>> :0 start_thread()<br>
>> libc.so.6 clone() (0xb77270d0+5E)<br>
>> === ---> Lock #1 (chan_sip.c): MUTEX 8959 sip_pvt_lock_full pvt<br>
>> 0x84b5660 (1)<br>
>> main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)<br>
>> main/lock.c:258 __ast_pthread_mutex_lock() (0x813864f+85)<br>
>> main/astobj2.c:198 __ao2_lock() (0x80906ad+7C)<br>
>> channels/chan_sip.c:8960 sip_pvt_lock_full()<br>
>> channels/chan_sip.c:28939 handle_request_do()<br>
>> channels/chan_sip.c:28885 sipsock_read()<br>
>> main/io.c:292 ast_io_wait() (0x8132d50+175)<br>
>> channels/chan_sip.c:29484 do_monitor()<br>
>> main/utils.c:1223 dummy_start()<br>
>> :0 start_thread()<br>
>> libc.so.6 clone() (0xb77270d0+5E)<br>
>> === ---> Lock #2 (chan_sip.c): MUTEX 17047 register_verify peer<br>
>> 0x86df2b8 (1)<br>
>> main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)<br>
>> main/lock.c:258 __ast_pthread_mutex_lock() (0x813864f+85)<br>
>> main/astobj2.c:198 __ao2_lock() (0x80906ad+7C)<br>
>> channels/chan_sip.c:17048 register_verify()<br>
>> channels/chan_sip.c:28477 handle_request_register()<br>
>> channels/chan_sip.c:28785 handle_incoming()<br>
>> channels/chan_sip.c:28953 handle_request_do()<br>
>> channels/chan_sip.c:28885 sipsock_read()<br>
>> main/io.c:292 ast_io_wait() (0x8132d50+175)<br>
>> channels/chan_sip.c:29484 do_monitor()<br>
>> main/utils.c:1223 dummy_start()<br>
>> :0 start_thread()<br>
>> libc.so.6 clone() (0xb77270d0+5E)<br>
>> === ---> Lock #3 (db.c): MUTEX 329 ast_db_put &dblock 0x8214f60 (1)<br>
>> main/logger.c:1702 ast_bt_get_addresses() (0x813f3ee+19)<br>
>> main/lock.c:380 __ast_pthread_mutex_trylock() (0x81389b7+85)<br>
>> main/db.c:329 ast_db_put() (0x80eeec4+124)<br>
>> channels/chan_sip.c:16242 parse_register_contact()<br>
>> channels/chan_sip.c:17069 register_verify()<br>
>> channels/chan_sip.c:28477 handle_request_register()<br>
>> channels/chan_sip.c:28785 handle_incoming()<br>
>> channels/chan_sip.c:28953 handle_request_do()<br>
>> channels/chan_sip.c:28885 sipsock_read()<br>
>> main/io.c:292 ast_io_wait() (0x8132d50+175)<br>
>> channels/chan_sip.c:29484 do_monitor()<br>
>> main/utils.c:1223 dummy_start()<br>
>> :0 start_thread()<br>
>> libc.so.6 clone() (0xb77270d0+5E)<br>
>> === -------------------------------------------------------------------<br>
>> ===<br>
>> =======================================================================<br>
>><br>
>><br>
>> On 11.11.2015 10:03, Jaco Kroon wrote:<br>
>>> Hi Mark,<br>
>>><br>
>>> I suspect the following relates:<br>
>>><br>
>>><br>
>>> <a href="http://jkroon.blogs.uls.co.za/it/voip/asterisk-massively-speeding-up-those-register-requests" rel="noreferrer" target="_blank">http://jkroon.blogs.uls.co.za/it/voip/asterisk-massively-speeding-up-those-register-requests</a><br>
>>><br>
>>><br>
>>> That should explain the underlying problem for you, and potentially<br>
>>> provide you with a fix depending on your situation. We've been using<br>
>>> the proposed "workaround" there since the time of writing with no ill<br>
>>> effect, however, you need to take heed of the warnings posted there.<br>
>>><br>
>>> We had to modify code on at least some instances to avoid DB() dialplan<br>
>>> use, in our case that wasn't difficult, but still, you need to be aware<br>
>>> of the risks.<br>
>>><br>
>>> Kind Regards,<br>
>>> Jaco<br>
>>><br>
>><br>
><br>
><br>
><br>
<br>
<br>
--<br>
_____________________________________________________________________<br>
-- Bandwidth and Colocation Provided by <a href="http://www.api-digital.com" rel="noreferrer" target="_blank">http://www.api-digital.com</a> --<br>
<br>
asterisk-dev mailing list<br>
To UNSUBSCRIBE or update options visit:<br>
<a href="http://lists.digium.com/mailman/listinfo/asterisk-dev" rel="noreferrer" target="_blank">http://lists.digium.com/mailman/listinfo/asterisk-dev</a><br>
</blockquote></div>