[asterisk-bugs] [JIRA] (ASTERISK-28469) AMI frozen on high load
Steve Sether (JIRA)
noreply at issues.asterisk.org
Mon Dec 2 14:24:32 CST 2019
[ https://issues.asterisk.org/jira/browse/ASTERISK-28469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=248929#comment-248929 ]
Steve Sether edited comment on ASTERISK-28469 at 12/2/19 2:23 PM:
------------------------------------------------------------------
I'm having a very similar problem to this, but I don't think it's an inter-process deadlock. I simplified my AMI client down to a shell script in an attempt to isolate the problem. AMI stops sending data to any processes connected to it, and stops accepting incoming connections. I can telnet to the port, but I don't get the normal greeting from Asterisk of:
Asterisk Call Manager/5.0.1
Like the above problem, "manager show eventq" shows a long backlog of events being queued. It seems to happen at load, around maybe 30 simultaneous calls.
This happens on the newest release of Asterisk 16. 16.6.1
The shell script is simply:
(I switched to port 5039 so my normal AMI clients couldn't connect so as to isolate)
#!/bin/sh
exec 3<>/dev/tcp/127.0.0.1/5039
cat /root/login.script >&3
timeout 1 cat <&3
sleep 1
while [ 1 ]; do
cat /root/status.script >&3
timeout 1 cat <&3 > /dev/null
echo loop
sleep .1
done;
login.script looks like this:
Action: Login
Username: usernameHere
Secret: passwordHere
ActionId: 1234
status.script looks like this:
Action: Status
Variables: CDR(companyID),companyID,from_user,to_user,CALLERID(num),CALLERID(name),CDR(ds_type),CDR(ds_dnis),CDR(ds_ani),CDR(ds_from),CDR(ds_to),CDR(start),CDR(answer)
ActionId: 4321
was (Author: stevesether):
I'm having a very similar problem to this, but I don't think it's an inter-process deadlock. I simplified my AMI client down to a shell script in an attempt to isolate the problem. AMI stops sending data to any processes connected to it, and stops accepting incoming connections. I can telnet to the port, but I don't get the normal greeting from Asterisk of:
Asterisk Call Manager/5.0.1
Like the above problem, "manager show eventq" shows a long backlog of events being queued. It seems to happen at load, around maybe 30 simultaneous calls.
The shell script is simply:
(I switched to port 5039 so my normal AMI clients couldn't connect so as to isolate)
#!/bin/sh
exec 3<>/dev/tcp/127.0.0.1/5039
cat /root/login.script >&3
timeout 1 cat <&3
sleep 1
while [ 1 ]; do
cat /root/status.script >&3
timeout 1 cat <&3 > /dev/null
echo loop
sleep .1
done;
login.script looks like this:
Action: Login
Username: usernameHere
Secret: passwordHere
ActionId: 1234
status.script looks like this:
Action: Status
Variables: CDR(companyID),companyID,from_user,to_user,CALLERID(num),CALLERID(name),CDR(ds_type),CDR(ds_dnis),CDR(ds_ani),CDR(ds_from),CDR(ds_to),CDR(start),CDR(answer)
ActionId: 4321
> AMI frozen on high load
> -----------------------
>
> Key: ASTERISK-28469
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-28469
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Core/ManagerInterface
> Affects Versions: 16.4.0
> Environment: Linux deacsrtsk12 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
> Reporter: Stefan Repke
> Assignee: Stefan Repke
> Attachments: case2.zip, core-asterisk-running-2019-07-01T16-15-22+0200-brief.txt, core-asterisk-running-2019-07-01T16-15-22+0200-full.txt, core-asterisk-running-2019-07-01T16-15-22+0200-locks.txt, core-asterisk-running-2019-07-01T16-15-22+0200-thread1.txt, core show locks, core show threads, debug.deacsrtsk12
>
>
> The AMI queue gets stuck when running Asterisk 16.0.4 with ~30 calls per minute:
> After a few hundred calls have been processed, no further manager events are distrubuted and no new manager connections are accepted.
> The problem is reproducable and we've created a dump. I think the reason is that thread 0x7f6e6e56c700 is blocked in {{send_string}} of {{manager.c:3033}} (while holding/blocking lock 0x7f6ee0003260).
> Our version of libc6 is 2.19-18+deb8u10.
> The CLI command {{manager show eventq}} produces dozens of megabytes of output consisting of the queued events.
> {noformat:title=core show locks}
> =======================================================================
> === 16.4.0
> === Currently Held Locks
> =======================================================================
> ===
> === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
> ===
> === Thread ID: 0x7f6e6ffff700 LWP:11439 (desc->accept_fn started at [ 770] tcptls.c ast_tcptls_server_start())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 6896 purge_sessions session 0x7f6ee0003260 (1)
> [0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> [inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> [0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> [0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> [0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> [0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> [0x45865f] asterisk asterisk.c:1420 netconsole()
> [0x5a40c5] asterisk utils.c:1251 dummy_start()
> [0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> [0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6e56c700 LWP:11491 (handle_tcptls_connection started at [ 309] tcptls.c ast_tcptls_server_root())
> === ---> Lock #0 (manager.c): MUTEX 6104 process_events s->session 0x7f6ee0003260 (1)
> [0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> [inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> [0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> [0x5fff11] asterisk manager.c:6105 process_events()
> [0x604be8] asterisk manager.c:6714 do_message()
> [inlined] asterisk manager.c:6853 session_do()
> [0x5a40c5] asterisk utils.c:1251 dummy_start()
> [0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> [0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6dc38700 LWP:23563 (netconsole started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> [0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> [inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> [0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> [0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> [0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> [0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> [0x45865f] asterisk asterisk.c:1420 netconsole()
> [0x5a40c5] asterisk utils.c:1251 dummy_start()
> [0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> [0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6d380700 LWP:23932 (netconsole started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> [0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> [inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> [0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> [0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> [0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> [0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> [0x45865f] asterisk asterisk.c:1420 netconsole()
> [0x5a40c5] asterisk utils.c:1251 dummy_start()
> [0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> [0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6e6d3fc700 LWP:24956 (netconsole started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> [0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> [inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> [0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> [0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> [0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> [0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> [0x45865f] asterisk asterisk.c:1420 netconsole()
> [0x5a40c5] asterisk utils.c:1251 dummy_start()
> [0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> [0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> === Thread ID: 0x7f6efb470700 LWP:25271 (netconsole started at [ 1529] asterisk.c listener())
> === ---> Waiting for Lock #0 (manager.c): MUTEX 2673 handle_showmanconn session 0x7f6ee0003260 (1)
> [0x50277a] asterisk lock.h:446 ast_reentrancy_lock()
> [inlined] asterisk lock.c:272 __ast_pthread_mutex_lock()
> [0x45bd6d] asterisk astobj2.c:243 __ao2_lock()
> [0x5faaca] asterisk manager.c:2678 handle_showmanconn()
> [0x4bf2e5] asterisk cli.c:2834 ast_cli_command_full()
> [0x4bf454] asterisk cli.c:2861 ast_cli_command_multiple_full()
> [0x45865f] asterisk asterisk.c:1420 netconsole()
> [0x5a40c5] asterisk utils.c:1251 dummy_start()
> [0x7f6efd2c5064] libpthread.so.0 pthread_create.c:309 start_thread()
> [0x7f6efc26862d] libc.so.6 clone.S:113 clone()
> === --- ---> Locked Here: manager.c line 6104 (process_events)
> === -------------------------------------------------------------------
> ===
> =======================================================================
> {noformat}
> {noformat:title=brief.txt}
> ...
> Thread 52 (Thread 0x7f6e6e56c700 (LWP 11491)):
> #0 0x00007f6efc25faed in poll () at ../sysdeps/unix/syscall-template.S:81
> #1 0x00000000005a6b78 in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7f6e6e56b2d0) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
> #2 ast_wait_for_output (fd=<optimized out>, ms=<optimized out>) at utils.c:1357
> #3 0x00000000004f6f89 in ast_iostream_write (stream=0x7f6e680041e0, buffer=0x7f6e7018e4f8, size=5000, size at entry=487) at iostream.c:473
> #4 0x00000000004f747d in ast_iostream_write (stream=stream at entry=0x7f6e680041e0, buffer=buffer at entry=0x7f6e7018e4f8, size=size at entry=487) at iostream.c:475
> #5 0x0000000000600121 in send_string (string=0x7f6e7018e4f8 "Event: NewextenrnPrivilege: dialplan,allrnTimestamp: 1561990424.176441rnSystemName: deacsrtsk12rnChannel: SIP/999253-000002c8rnChannelState: 4rnChannelStateDesc: RingrnCallerIDNum: 999253rnCallerIDNam"..., s=0x7f6e6e56b4d0) at manager.c:3033
> #6 process_events (s=0x7f6e6e56b4d0) at manager.c:6117
> #7 0x0000000000604be8 in do_message (s=<optimized out>) at manager.c:6714
> #8 session_do (data=0x7f6e6e56b2d0) at manager.c:6853
> #9 0x00000000005a40c5 in dummy_start (data=0x7f6e68003dc0) at utils.c:1249
> #10 0x00007f6efd2c5064 in start_thread (arg=0x7f6e6e56c700) at pthread_create.c:309
> #11 0x00007f6efc26862d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> ...
> {noformat}
> {noformat:title=full.txt}
> ...
> Thread 52 (Thread 0x7f6e6e56c700 (LWP 11491)):
> #0 0x00007f6efc25faed in poll () at ../sysdeps/unix/syscall-template.S:81
> No locals.
> #1 0x00000000005a6b78 in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7f6e6e56b2d0) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
> No locals.
> #2 ast_wait_for_output (fd=<optimized out>, ms=<optimized out>) at utils.c:1357
> pfd = {{fd = 22, events = 4, revents = 0}}
> #3 0x00000000004f6f89 in ast_iostream_write (stream=0x7f6e680041e0, buffer=0x7f6e7018e4f8, size=5000, size at entry=487) at iostream.c:473
> No locals.
> #4 0x00000000004f747d in ast_iostream_write (stream=stream at entry=0x7f6e680041e0, buffer=buffer at entry=0x7f6e7018e4f8, size=size at entry=487) at iostream.c:475
> No locals.
> #5 0x0000000000600121 in send_string (string=0x7f6e7018e4f8 "Event: NewextenrnPrivilege: dialplan,allrnTimestamp: 1561990424.176441rnSystemName: deacsrtsk12rnChannel: SIP/999253-000002c8rnChannelState: 4rnChannelStateDesc: RingrnCallerIDNum: 999253rnCallerIDNam"..., s=0x7f6e6e56b4d0) at manager.c:3033
> stream = 0x7f6e680041e0
> len = 487
> res = <optimized out>
> string = 0x7f6e7018e4f8 "Event: NewextenrnPrivilege: dialplan,allrnTimestamp: 1561990424.176441rnSystemName: deacsrtsk12rnChannel: SIP/999253-000002c8rnChannelState: 4rnChannelStateDesc: RingrnCallerIDNum: 999253rnCallerIDNam"...
> s = 0x7f6e6e56b4d0
> #6 process_events (s=0x7f6e6e56b4d0) at manager.c:6117
> eqe = <optimized out>
> ret = <optimized out>
> __PRETTY_FUNCTION__ = "process_events"
> #7 0x0000000000604be8 in do_message (s=<optimized out>) at manager.c:6714
> m = {hdrcount = 6, headers = {0x7f6ee0004620 "action: Monitor", 0x7f6ee04085d0 "actionid: 537#355425276_673410#", 0x7f6ee0008fb0 "file: /tmp/U23TJQ5V23X4MZQB", 0x7f6ee0009b70 "format: wav", 0x7f6ee0009b20 "channel: IAX2/999005-3957", 0x7f6ee0008fe0 "mix: true", 0x0 <repeats 122 times>}}
> header_buf = "mix: true000AX2/999005-3957000B000061060#", '000' <repeats 993 times>
> now = 1561988218
> #8 session_do (data=0x7f6e6e56b2d0) at manager.c:6853
> s = {session = 0x7f6ee00032b8, stream = 0x7f6e680041e0, tcptls_session = 0x7f6e68004068, parsing = MESSAGE_OKAY, write_error = -1, hook = 0x0, lock = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '000' <repeats 16 times>, "001", '000' <repeats 22 times>, __align = 0}, track = 0x7f6ee0005ee0, flags = {tracking = 1, setup = 1}}}
> res = 5
> arg = 1
> ser_remote_address_tmp = {ss = {ss_family = 10, __ss_align = 0, __ss_padding = "000000377377177000000001", '000' <repeats 48 times>, "|254367373n177000000000000000000000000000000370377377377377377377377220324367373n177000000340266Vnn177000000220I036377n177000000000000000000000000000"}, len = 28}
> __PRETTY_FUNCTION__ = "session_do"
> #9 0x00000000005a40c5 in dummy_start (data=0x7f6e68003dc0) at utils.c:1249
> __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140114181235520, 57934947619282876, 140112167979552, 140114047020320, 21, 140112274310912, 57947742297497532, -139326592165109828}, __mask_was_saved = 0}}, __pad = {0x7f6e6e56bef0, 0x0, 0x7f6efc521878 <__elf_set___libc_thread_subfreeres_element___rpc_thread_destroy__>, 0x7f6efc521880 <internal_trans_names>}}
> __cancel_arg = 0x7f6e6e56c700
> __not_first_call = <optimized out>
> ret = <optimized out>
> a = {start_routine = 0x5955b0 <handle_tcptls_connection>, data = 0x7f6e68004068, name = 0x7f6e68004a20 "handle_tcptls_connection started at [ 309] tcptls.c ast_tcptls_server_root()"}
> mutex_attr = {__size = "001000000", __align = 1}
> __PRETTY_FUNCTION__ = "dummy_start"
> #10 0x00007f6efd2c5064 in start_thread (arg=0x7f6e6e56c700) at pthread_create.c:309
> __res = <optimized out>
> pd = 0x7f6e6e56c700
> now = <optimized out>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140112274310912, -139326233515286596, 0, 140114047020320, 21, 140112274310912, 57947742289108924, 57695635703832508}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> not_first_call = <optimized out>
> pagesize_m1 = <optimized out>
> sp = <optimized out>
> freesize = <optimized out>
> __PRETTY_FUNCTION__ = "start_thread"
> #11 0x00007f6efc26862d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> No locals.
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list