[asterisk-bugs] [JIRA] (ASTERISK-29786) cdr.c: FRACK! causing crashes

Jon Sparks (JIRA) noreply at issues.asterisk.org
Fri Dec 3 11:26:34 CST 2021


    [ https://issues.asterisk.org/jira/browse/ASTERISK-29786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=257248#comment-257248 ] 

Jon Sparks commented on ASTERISK-29786:
---------------------------------------

I attached the logs from the most recent hard crash. I will attempt my best to describe what is happening. We have multiple Asterisk servers that do many different things. Currently we are attempting to stay under 15 calls per server and cycling them constantly because of this issue. Call load or uptime doesn’t seem to matter. The server can be up for 5 minutes to 12 hours before the crash. In the case of the graceful restarts the frack could happen hours before the server restarts.

In the case of these failures the most common part is we have calls in a conference room. Then we place a new call into the conference room that runs a stasis app. That app could be doing a mixture of things. The case of the logs provided we are listening to the call to collect data about it.

Our belief is when the hangup occurs that there is a race condition that Asterisk freaks out about or we are crossing some kind of limit not known to us.

Looking at these logs before I sent them I noticed stasis/m:cdr:aggregator-00000005 is pretty interesting with the queue and max queue. I can look back to see if that is similar to other crashes.

> cdr.c: FRACK! causing crashes
> -----------------------------
>
>                 Key: ASTERISK-29786
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-29786
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: CDR/General, Core/Stasis
>    Affects Versions: 18.6.0, 18.7.0, 18.7.1, 18.8.0
>            Reporter: Jon Sparks
>            Assignee: Unassigned
>         Attachments: core.ip-10-10-12-153.us-east-2.compute.internal-2021-12-03T14-42-43+0000-brief.txt, core.ip-10-10-12-153.us-east-2.compute.internal-2021-12-03T14-42-43+0000-full.txt, core.ip-10-10-12-153.us-east-2.compute.internal-2021-12-03T14-42-43+0000-info.txt, core.ip-10-10-12-153.us-east-2.compute.internal-2021-12-03T14-42-43+0000-locks.txt, core.ip-10-10-12-153.us-east-2.compute.internal-2021-12-03T14-42-43+0000-thread1.txt
>
>
> We are running 18.8.0 and experiencing a couple strange crashes with Asterisk. Here are the fracks we are seeing. 
> {noformat}
> FRACK!, Failed assertion bad magic number 0x0 for object 0xffff80059bd8 (0) - channel.c - ast_waitfor_nandfds
> FRACK!, Failed assertion user_data is NULL (0) - cdr.c - handle_channel_snapshot_update_message - 2327/2334
> FRACK!, Failed assertion 0 (0) - cdr.c - handle_channel_snapshot_update_message - 2302
> FRACK!, Failed assertion strcasecmp(snapshot->base->name, cdr->party_a.snapshot->base->name) == 0 (0) - cdr.c -base_process_party_a - 1542
> {noformat}
> A few of the latest backtraces:
> {noformat}
> Got 12 backtrace records
> # 0: [0xaaaab64a1270] asterisk utils.c:2583 __ast_assert_failed()
> # 1: [0xaaaab64b7594] asterisk utils.h:708 _ast_assert()
> # 2: [0xaaaab64bb418] asterisk cdr.c:1545 base_process_party_a()
> # 3: [0xaaaab64be61c] asterisk cdr.c:2312 handle_channel_snapshot_update_message()
> # 4: [0xaaaab6468ad8] asterisk stasis_message_router.c:202 router_dispatch()
> # 5: [0xaaaab644caec] asterisk stasis.c:787 subscription_invoke()
> # 6: [0xaaaab644e514] asterisk stasis.c:1267 dispatch_exec_async()
> # 7: [0xaaaab6481b20] asterisk taskprocessor.c:1235 ast_taskprocessor_execute()
> # 8: [0xaaaab647e260] asterisk taskprocessor.c:201 default_tps_processing_function()
> # 9: [0xaaaab649db68] asterisk utils.c:1428 dummy_start()
> #10: [0xffff836614fc] libpthread.so.0 pthread_create.c:477 start_thread()
> #11: [0xffff8336d67c] libc.so.6 :0 clone()
> {noformat}
> {noformat}
> Got 12 backtrace records
> # 0: [0xaaaad2dc7270] asterisk utils.c:2583 __ast_assert_failed()
> # 1: [0xaaaad2c08178] asterisk astobj2.c:212 log_bad_ao2()
> # 2: [0xaaaad2c08270] asterisk astobj2.c:224 __ao2_lock()
> # 3: [0xaaaad2de4738] asterisk cdr.c:2328 handle_channel_snapshot_update_message()
> # 4: [0xaaaad2d8ead8] asterisk stasis_message_router.c:202 router_dispatch()
> # 5: [0xaaaad2d72aec] asterisk stasis.c:787 subscription_invoke()
> # 6: [0xaaaad2d74514] asterisk stasis.c:1267 dispatch_exec_async()
> # 7: [0xaaaad2da7b20] asterisk taskprocessor.c:1235 ast_taskprocessor_execute()
> # 8: [0xaaaad2da4260] asterisk taskprocessor.c:201 default_tps_processing_function()
> # 9: [0xaaaad2dc3b68] asterisk utils.c:1428 dummy_start()
> #10: [0xffff978f44fc] libpthread.so.0 pthread_create.c:477 start_thread()
> #11: [0xffff9760067c] libc.so.6 :0 clone()
> {noformat}
> {noformat}
> Got 11 backtrace records
> # 0: [0xaaaad2dc7270] asterisk utils.c:2583 __ast_assert_failed()
> # 1: [0xaaaad2ddd594] asterisk utils.h:708 _ast_assert()
> # 2: [0xaaaad2de458c] asterisk cdr.c:2302 handle_channel_snapshot_update_message()
> # 3: [0xaaaad2d8ead8] asterisk stasis_message_router.c:202 router_dispatch()
> # 4: [0xaaaad2d72aec] asterisk stasis.c:787 subscription_invoke()
> # 5: [0xaaaad2d74514] asterisk stasis.c:1267 dispatch_exec_async()
> # 6: [0xaaaad2da7b20] asterisk taskprocessor.c:1235 ast_taskprocessor_execute()
> # 7: [0xaaaad2da4260] asterisk taskprocessor.c:201 default_tps_processing_function()
> # 8: [0xaaaad2dc3b68] asterisk utils.c:1428 dummy_start()
> # 9: [0xffff978f44fc] libpthread.so.0 pthread_create.c:477 start_thread()
> #10: [0xffff9760067c] libc.so.6 :0 clone()
> {noformat}
> I’m seeing different issues with each frack. Two things I have noticed:
> When “Failed assertion strcasecmp” occurs it seems to wait until the last call ends and it does a core dump and acts like a graceful restart. 
> When “Failed assertion user_data is NULL” or “Failed assertion 0” occurs it is more likely the server will crash with active calls not graceful.
> I have yet to be able to reproduce it in our dev environment or find a root cause. In our prod environment the fracks were created 27,000 times in the last 24 hours. We believe it’s related to hangups with the stasis app.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list