[asterisk-bugs] [JIRA] (ASTERISK-29474) core: Exceptionally long queue length with Local channels and ConfBridge

Kevin Harwell (JIRA) noreply at issues.asterisk.org
Thu Jun 10 16:32:08 CDT 2021


    [ https://issues.asterisk.org/jira/browse/ASTERISK-29474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=255253#comment-255253 ] 

Kevin Harwell commented on ASTERISK-29474:
------------------------------------------

>From the backtraces at least I am not seeing a deadlock per se. In both currently attached backtraces I see channels being blocked by activity down in the speex resampler. For example (from [^core-asterisk-running-2021-06-10T12-30-28-0400-brief.txt]) the lock in thread 109:
{noformat}
Thread 109 (Thread 0x7f77c6f89700 (LWP 31527)):
#0  0x00007f77f539229c in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f77f538b7d1 in __GI___pthread_mutex_lock (mutex=0x556c6c3ee520) at ../nptl/pthread_mutex_lock.c:115
#2  0x0000556c67be95c0 in __ast_pthread_mutex_lock (filename=0x556c67d47151 "channel.c", lineno=1021, func=0x556c67d4b0d0 <__PRETTY_FUNCTION__.23269> "__ast_queue_frame", mutex_name=0x556c67d474cb "chan", t=0x556c6c3ee520) at lock.c:326
#3  0x0000556c67b2a0c7 in __ao2_lock (user_data=0x556c6c3ee570, lock_how=AO2_LOCK_REQ_MUTEX, file=0x556c67d47151 "channel.c", func=0x556c67d4b0d0 <__PRETTY_FUNCTION__.23269> "__ast_queue_frame", line=1021, var=0x556c67d474cb "chan") at astobj2.c:241
#4  0x0000556c67b676ff in __ast_queue_frame (chan=0x556c6c3ee570, fin=0x7f77d4085840, head=0, after=0x0) at channel.c:1021
#5  0x0000556c67b67f81 in ast_queue_frame (chan=0x556c6c3ee570, fin=0x7f77d4085840) at channel.c:1137
#6  0x0000556c67ba5a6e in unreal_queue_frame (p=0x556c6c452920, isoutbound=1, f=0x7f77d4085840, us=0x556c6c25e600, us_locked=1) at core_unreal.c:243
#7  0x0000556c67ba5ef1 in ast_unreal_write_stream (ast=0x556c6c25e600, stream_num=0, f=0x7f77d4085840) at core_unreal.c:360
#8  0x0000556c67b73dcb in tech_write (chan=0x556c6c25e600, stream=0x556c6c452b00, default_stream=0x556c6c452b00, frame=0x7f77d4085840) at channel.c:5088
#9  0x0000556c67b75739 in ast_write_stream (chan=0x556c6c25e600, stream_num=0, fr=0x7f77d4085840) at channel.c:5434
#10 0x0000556c67b5911e in bridge_channel_handle_write (bridge_channel=0x556c6c260270) at bridge_channel.c:2541
#11 0x0000556c67b599db in bridge_channel_wait (bridge_channel=0x556c6c260270) at bridge_channel.c:2853
#12 0x0000556c67b5a1cf in bridge_channel_internal_join (bridge_channel=0x556c6c260270) at bridge_channel.c:2993
#13 0x0000556c67b3cb19 in ast_bridge_join (bridge=0x7f77e465eff0, chan=0x556c6c25e600, swap=0x0, features=0x7f77c6f85840, tech_args=0x7f77c6f85868, flags=(unknown: 0)) at bridge.c:1725
#14 0x00007f77d1c1ed1e in confbridge_exec (chan=0x556c6c25e600, data=0x7f77c6f85a30 "partyline270") at app_confbridge.c:2795
#15 0x0000556c67c1f4be in pbx_exec (c=0x556c6c25e600, app=0x556c6c13a1c0, data=0x7f77c6f85a30 "partyline270") at pbx_app.c:492
#16 0x0000556c67c0a344 in pbx_extension_helper (c=0x556c6c25e600, con=0x0, context=0x556c6c25efc0 "confcrosstalk", exten=0x556c6c25f010 "1270", priority=5, label=0x0, callerid=0x0, action=E_SPAWN, found=0x7f77c6f88ccc, combined_find_spawn=1) at pbx.c:2947
#17 0x0000556c67c0e597 in ast_spawn_extension (c=0x556c6c25e600, context=0x556c6c25efc0 "confcrosstalk", exten=0x556c6c25f010 "1270", priority=5, callerid=0x0, found=0x7f77c6f88ccc, combined_find_spawn=1) at pbx.c:4206
#18 0x0000556c67c0f2b7 in __ast_pbx_run (c=0x556c6c25e600, args=0x0) at pbx.c:4380
#19 0x0000556c67c10b2d in pbx_thread (data=0x556c6c25e600) at pbx.c:4704
#20 0x0000556c67caf897 in dummy_start (data=0x7f77ec090150) at utils.c:1299
#21 0x00007f77f5388fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#22 0x00007f77f4f5f4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
{noformat}
is being blocked by thread 108:
{noformat}
Thread 108 (Thread 0x7f77c8870700 (LWP 31525)):
#0  0x00007f77f1b180da in resampler_basic_interpolate_single (st=0x7f77ec18e150, channel_index=0, in=0x7f77ec080fb0, in_len=0x7f77c886f3d8, out=0x7f77ec0919d8, out_len=0x7f77c886f3d4) at speex/resample.c:473
#1  0x00007f77f1b19239 in speex_resampler_process_native (st=0x7f77ec18e150, channel_index=0, in_len=0x7f77c886f3d8, out=0x7f77ec0919d8, out_len=0x7f77c886f3d4) at speex/resample.c:870
#2  0x00007f77f1b195c6 in speex_resampler_process_int (st=0x7f77ec18e150, channel_index=0, in=0x556c6c2ba328, in_len=0x7f77c886f420, out=0x7f77ec0919d8, out_len=0x7f77c886f424) at speex/resample.c:933
#3  0x00007f77f1b1736d in resamp_framein (pvt=0x7f77ec0918b0, f=0x556c6c2ba260) at codec_resample.c:123
#4  0x0000556c67ca6c5e in framein (pvt=0x7f77ec0918b0, f=0x556c6c2ba260) at translate.c:430
#5  0x0000556c67ca74c5 in ast_translate (path=0x7f77ec0883b0, f=0x7f77d40bd000, consume=0) at translate.c:618
#6  0x0000556c67b75099 in ast_write_stream (chan=0x556c6c3ee570, stream_num=-1, fr=0x7f77d40bd000) at channel.c:5299
#7  0x0000556c67b73e4c in ast_write (chan=0x556c6c3ee570, fr=0x7f77d40bd000) at channel.c:5098
#8  0x0000556c67bc6781 in ast_readaudio_callback (s=0x7f77ec141880) at file.c:927
#9  0x0000556c67bc69da in ast_fsread_audio (data=0x7f77ec141880) at file.c:967
#10 0x0000556c67b6f797 in __ast_read (chan=0x556c6c3ee570, dropaudio=0, dropnondefault=1) at channel.c:3595
#11 0x0000556c67b71efe in ast_read (chan=0x556c6c3ee570) at channel.c:4281
#12 0x0000556c67bc87ab in waitstream_core (c=0x556c6c3ee570, breakon=0x7f77d3ccc110 "", forward=0x556c67d58360 "", reverse=0x556c67d58360 "", skip_ms=0, audiofd=-1, cmdfd=-1, context=0x0, cb=0x0) at file.c:1627
#13 0x0000556c67bc8d26 in ast_waitstream (c=0x556c6c3ee570, breakon=0x7f77d3ccc110 "") at file.c:1780
#14 0x00007f77d3ccba64 in playback_exec (chan=0x556c6c3ee570, data=0x556c6c391d10 "custom/switch/intercepts/custom/interlinked") at app_playback.c:492
#15 0x0000556c67c1f4be in pbx_exec (c=0x556c6c3ee570, app=0x556c6c8768e0, data=0x556c6c391d10 "custom/switch/intercepts/custom/interlinked") at pbx_app.c:492
#16 0x0000556c67c19489 in pbx_outgoing_exec (data=0x556c6bf1e170) at pbx.c:7661
#17 0x0000556c67caf897 in dummy_start (data=0x556c6c1d8fb0) at utils.c:1299
#18 0x00007f77f5388fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#19 0x00007f77f4f5f4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
{noformat}
I see several instances of similar in both attached backtraces. It could be something is up with the speex resampler (infinite loop?), or could be the system is just running slow or the timing was right, and the backtrace snapshots just happened to capture similar thread areas.

What's the output from the following when this is going on (curious if memory is also rising):
{noformat}
$ top -p $(pidof asterisk)
{noformat}

> core: Exceptionally long queue length with Local channels and ConfBridge
> ------------------------------------------------------------------------
>
>                 Key: ASTERISK-29474
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-29474
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_confbridge
>    Affects Versions: 18.4.0
>            Reporter: N A
>         Attachments: core-asterisk-running-2021-06-10T12-30-28-0400-brief.txt, core-asterisk-running-2021-06-10T12-30-28-0400-full.txt, core-asterisk-running-2021-06-10T12-30-28-0400-info.txt, core-asterisk-running-2021-06-10T12-30-28-0400-locks.txt, core-asterisk-running-2021-06-10T12-30-28-0400-thread1.txt, core-asterisk-running-2021-06-10T12-38-12-0400-brief.txt, core-asterisk-running-2021-06-10T12-38-12-0400-full.txt, core-asterisk-running-2021-06-10T12-38-12-0400-info.txt, core-asterisk-running-2021-06-10T12-38-12-0400-locks.txt, core-asterisk-running-2021-06-10T12-38-12-0400-thread1.txt, debug_threads.txt
>
>
> It's our friend "__ast_queue_frame: Exceptionally long queue length queuing to..." again.
> I'm mark this as app_confbridge related because 99% of the time, deadlocks on my systems are caused by ConfBridge and using other applications if possible does not cause this issue at all.
> This issue happens with ConfBridges in multiple different scenarios - below is one that replicates every single time:
> - bring another channel playing audio into a confbridge
> - bring another channel that records locally into the confbridge, but muted
> - have that original channel join that ConfBridge
> All is good. Now, as soon as, from another phone, I join that same confbridge, even if muted, this whole thing happens immediately until at least after all channels and the bridge are torn down. Usually Asterisk crashes fairly quickly when this kind of thing happens; in this specific scenario, it doesn't crash quickly but it does basically kill both channels effectively and these warnings continue until everything involved is torn down, though sometimes this persists 10-20 seconds after that. In certain other similar scenarios, Asterisk usually crashes within a minute as CPU usage climbs to 100%.
> This is just one scenario that leads to this deadlock, but it can happen in several different ways. Typically it happens when multiple parties are in this aforementioned bridge using ConfBridge. Replacing ConfBridge with ChanSpy in places where possible is enough to make the issue disappear, hence why this seems to be a bug with app_confbridge.
> Attached are output from astcoredumper and core show locks during the deadlock.
> In this particular deadlock, there seems to be some kind of infinite loop caused by infinite progress updates, although I only see the infinite progress stuff when Asterisk is compiled with DONT_OPTIMIZE and DEBUG_THREADS. Regardless, this deadlocks occurs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list