[asterisk-bugs] [JIRA] (ASTERISK-28469) AMI frozen on high load

Stefan Repke (JIRA) noreply at issues.asterisk.org
Mon Jul 22 02:30:48 CDT 2019


     [ https://issues.asterisk.org/jira/browse/ASTERISK-28469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Repke updated ASTERISK-28469:
------------------------------------

    Status: Waiting for Feedback  (was: Waiting for Feedback)

I finally managed to solve the mystery. It is an inter-procedural deadlock!

The server (Asterisk) cannot write because the client does not read.
The client stopped reading because the server is too slow in reading.
The server cannot read because it is blocked in writing.

----

The only client that is connected is a self-written Java implementation based on Netty (event-based, non-blocking).
But in order to relax high load situations, I've implemented the following:
Whenever the TCP send buffer (from client to server) is full, we stop receiving TCP packets until the opposite is receiving again.

Asterisk seams to have a similar behavior: The AMI server stops reading when it cannot write!

----

Although this behavior seams reasonable for some protocols (request-response), it is a bad idea whenever both sides can send data spontaneously -- like AMI:
- the server can send an arbitrary number of events and
- the client can send an arbitrary number of actions.

Therefore, I consider this behavior as a bug in both my client and the AMI server.

I've fixed my client, which solves the problem for me.

Maybe you want to do reading an writing on different threads?
Or you switch to an event-based non-blocking socket framework?
BTW: when blocked, the AMI server doesn't accept any new AMI connections.


> AMI frozen on high load
> -----------------------
>
>                 Key: ASTERISK-28469
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28469
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/ManagerInterface
>    Affects Versions: 16.4.0
>         Environment: Linux deacsrtsk12 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
>            Reporter: Stefan Repke
>            Assignee: Stefan Repke
>         Attachments: case2.zip, core-asterisk-running-2019-07-01T16-15-22+0200-brief.txt, core-asterisk-running-2019-07-01T16-15-22+0200-full.txt, core-asterisk-running-2019-07-01T16-15-22+0200-locks.txt, core-asterisk-running-2019-07-01T16-15-22+0200-thread1.txt, core show locks, core show threads, debug.deacsrtsk12
>
>
> The AMI queue gets stuck when running Asterisk 16.0.4 with ~30 calls per minute: 
> After a few hundred calls have been processed, no further manager events are distrubuted and no new manager connections are accepted.
> The problem is reproducable and we've created a dump. I think the reason is that thread 0x7f6e6e56c700 is blocked in {{send_string}} of {{manager.c:3033}} (while holding/blocking lock 0x7f6ee0003260).
> Our version of libc6 is 2.19-18+deb8u10.
> The CLI command {{manager show eventq}} produces dozens of megabytes of output consisting of the queued events.
> {noformat:title=core show locks}
> =======================================================================
> === 16.4.0
> === Currently Held Locks
> =======================================================================
> ===
> === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
> ===
> === Thread ID: 0x7f6e6ffff700 LWP:11439 (desc->accept_fn      started at [  770] tcptls.c ast_tcptls_server_start())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 6896 purge_sessions session 0x7f6ee0003260 (1)
> 	[0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> 	[inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> 	[0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> 	[0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> 	[0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> 	[0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> 	[0x45865f] asterisk asterisk.c:1420 netconsole()
> 	[0x5a40c5] asterisk utils.c:1251 dummy_start()
> 	[0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> 	[0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6e56c700 LWP:11491 (handle_tcptls_connection started at [  309] tcptls.c ast_tcptls_server_root())
> === ---> Lock #0 (manager.c): MUTEX 6104 process_events s->session 0x7f6ee0003260 (1)
> 	[0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> 	[inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> 	[0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> 	[0x5fff11] asterisk manager.c:6105 process_events()
> 	[0x604be8] asterisk manager.c:6714 do_message()
> 	[inlined] asterisk manager.c:6853 session_do()
> 	[0x5a40c5] asterisk utils.c:1251 dummy_start()
> 	[0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> 	[0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6dc38700 LWP:23563 (netconsole           started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> 	[0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> 	[inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> 	[0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> 	[0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> 	[0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> 	[0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> 	[0x45865f] asterisk asterisk.c:1420 netconsole()
> 	[0x5a40c5] asterisk utils.c:1251 dummy_start()
> 	[0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> 	[0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6d380700 LWP:23932 (netconsole           started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> 	[0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> 	[inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> 	[0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> 	[0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> 	[0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> 	[0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> 	[0x45865f] asterisk asterisk.c:1420 netconsole()
> 	[0x5a40c5] asterisk utils.c:1251 dummy_start()
> 	[0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> 	[0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6d3fc700 LWP:24956 (netconsole           started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> 	[0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> 	[inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> 	[0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> 	[0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> 	[0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> 	[0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> 	[0x45865f] asterisk asterisk.c:1420 netconsole()
> 	[0x5a40c5] asterisk utils.c:1251 dummy_start()
> 	[0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> 	[0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6efb470700 LWP:25271 (netconsole           started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> 	[0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> 	[inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> 	[0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> 	[0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> 	[0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> 	[0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> 	[0x45865f] asterisk asterisk.c:1420 netconsole()
> 	[0x5a40c5] asterisk utils.c:1251 dummy_start()
> 	[0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> 	[0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> =======================================================================
> {noformat}
> {noformat:title=brief.txt}
> ...
> Thread 52 (Thread 0x7f6e6e56c700 (LWP 11491)):
> #0  0x00007f6efc25faed in poll () at ../sysdeps/unix/syscall-template.S:81
> #1  0x00000000005a6b78 in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7f6e6e56b2d0) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
> #2  ast_wait_for_output (fd=<optimized out>, ms=<optimized out>) at utils.c:1357
> #3  0x00000000004f6f89 in ast_iostream_write (stream=0x7f6e680041e0, buffer=0x7f6e7018e4f8, size=5000, size at entry=487) at iostream.c:473
> #4  0x00000000004f747d in ast_iostream_write (stream=stream at entry=0x7f6e680041e0, buffer=buffer at entry=0x7f6e7018e4f8, size=size at entry=487) at iostream.c:475
> #5  0x0000000000600121 in send_string (string=0x7f6e7018e4f8 "Event: NewextenrnPrivilege: dialplan,allrnTimestamp: 1561990424.176441rnSystemName: deacsrtsk12rnChannel: SIP/999253-000002c8rnChannelState: 4rnChannelStateDesc: RingrnCallerIDNum: 999253rnCallerIDNam"..., s=0x7f6e6e56b4d0) at manager.c:3033
> #6  process_events (s=0x7f6e6e56b4d0) at manager.c:6117
> #7  0x0000000000604be8 in do_message (s=<optimized out>) at manager.c:6714
> #8  session_do (data=0x7f6e6e56b2d0) at manager.c:6853
> #9  0x00000000005a40c5 in dummy_start (data=0x7f6e68003dc0) at utils.c:1249
> #10 0x00007f6efd2c5064 in start_thread (arg=0x7f6e6e56c700) at pthread_create.c:309
> #11 0x00007f6efc26862d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> ...
> {noformat}
> {noformat:title=full.txt}
> ...
> Thread 52 (Thread 0x7f6e6e56c700 (LWP 11491)):
> #0  0x00007f6efc25faed in poll () at ../sysdeps/unix/syscall-template.S:81
> No locals.
> #1  0x00000000005a6b78 in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7f6e6e56b2d0) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
> No locals.
> #2  ast_wait_for_output (fd=<optimized out>, ms=<optimized out>) at utils.c:1357
>         pfd = {{fd = 22, events = 4, revents = 0}}
> #3  0x00000000004f6f89 in ast_iostream_write (stream=0x7f6e680041e0, buffer=0x7f6e7018e4f8, size=5000, size at entry=487) at iostream.c:473
> No locals.
> #4  0x00000000004f747d in ast_iostream_write (stream=stream at entry=0x7f6e680041e0, buffer=buffer at entry=0x7f6e7018e4f8, size=size at entry=487) at iostream.c:475
> No locals.
> #5  0x0000000000600121 in send_string (string=0x7f6e7018e4f8 "Event: NewextenrnPrivilege: dialplan,allrnTimestamp: 1561990424.176441rnSystemName: deacsrtsk12rnChannel: SIP/999253-000002c8rnChannelState: 4rnChannelStateDesc: RingrnCallerIDNum: 999253rnCallerIDNam"..., s=0x7f6e6e56b4d0) at manager.c:3033
>         stream = 0x7f6e680041e0
>         len = 487
>         res = <optimized out>
>         string = 0x7f6e7018e4f8 "Event: NewextenrnPrivilege: dialplan,allrnTimestamp: 1561990424.176441rnSystemName: deacsrtsk12rnChannel: SIP/999253-000002c8rnChannelState: 4rnChannelStateDesc: RingrnCallerIDNum: 999253rnCallerIDNam"...
>         s = 0x7f6e6e56b4d0
> #6  process_events (s=0x7f6e6e56b4d0) at manager.c:6117
>         eqe = <optimized out>
>         ret = <optimized out>
>         __PRETTY_FUNCTION__ = "process_events"
> #7  0x0000000000604be8 in do_message (s=<optimized out>) at manager.c:6714
>         m = {hdrcount = 6, headers = {0x7f6ee0004620 "action: Monitor", 0x7f6ee04085d0 "actionid: 537#355425276_673410#", 0x7f6ee0008fb0 "file: /tmp/U23TJQ5V23X4MZQB", 0x7f6ee0009b70 "format: wav", 0x7f6ee0009b20 "channel: IAX2/999005-3957", 0x7f6ee0008fe0 "mix: true", 0x0 <repeats 122 times>}}
>         header_buf = "mix: true000AX2/999005-3957000B000061060#", '000' <repeats 993 times>
>         now = 1561988218
> #8  session_do (data=0x7f6e6e56b2d0) at manager.c:6853
>         s = {session = 0x7f6ee00032b8, stream = 0x7f6e680041e0, tcptls_session = 0x7f6e68004068, parsing = MESSAGE_OKAY, write_error = -1, hook = 0x0, lock = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '000' <repeats 16 times>, "001", '000' <repeats 22 times>, __align = 0}, track = 0x7f6ee0005ee0, flags = {tracking = 1, setup = 1}}}
>         res = 5
>         arg = 1
>         ser_remote_address_tmp = {ss = {ss_family = 10, __ss_align = 0, __ss_padding = "000000377377177000000001", '000' <repeats 48 times>, "|254367373n177000000000000000000000000000000370377377377377377377377220324367373n177000000340266Vnn177000000220I036377n177000000000000000000000000000"}, len = 28}
>         __PRETTY_FUNCTION__ = "session_do"
> #9  0x00000000005a40c5 in dummy_start (data=0x7f6e68003dc0) at utils.c:1249
>         __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140114181235520, 57934947619282876, 140112167979552, 140114047020320, 21, 140112274310912, 57947742297497532, -139326592165109828}, __mask_was_saved = 0}}, __pad = {0x7f6e6e56bef0, 0x0, 0x7f6efc521878 <__elf_set___libc_thread_subfreeres_element___rpc_thread_destroy__>, 0x7f6efc521880 <internal_trans_names>}}
>         __cancel_arg = 0x7f6e6e56c700
>         __not_first_call = <optimized out>
>         ret = <optimized out>
>         a = {start_routine = 0x5955b0 <handle_tcptls_connection>, data = 0x7f6e68004068, name = 0x7f6e68004a20 "handle_tcptls_connection started at [  309] tcptls.c ast_tcptls_server_root()"}
>         mutex_attr = {__size = "001000000", __align = 1}
>         __PRETTY_FUNCTION__ = "dummy_start"
> #10 0x00007f6efd2c5064 in start_thread (arg=0x7f6e6e56c700) at pthread_create.c:309
>         __res = <optimized out>
>         pd = 0x7f6e6e56c700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140112274310912, -139326233515286596, 0, 140114047020320, 21, 140112274310912, 57947742289108924, 57695635703832508}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
>         __PRETTY_FUNCTION__ = "start_thread"
> #11 0x00007f6efc26862d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> No locals.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list