[asterisk-bugs] [JIRA] (ASTERISK-28237) "FRACK!, Failed assertion bad magic number" happens when unsubscribe an application from an event source
Evgenios Muratidis (JIRA)
noreply at issues.asterisk.org
Tue Mar 30 05:36:15 CDT 2021
[ https://issues.asterisk.org/jira/browse/ASTERISK-28237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=254386#comment-254386 ]
Evgenios Muratidis edited comment on ASTERISK-28237 at 3/30/21 5:35 AM:
------------------------------------------------------------------------
Hello everybody. Most likely I found how to fix the current problem in this version. The main problem causing the exception to be thrown is related to several situations:
1. In file res/stasis/messaging.c, function "messaging_app_subscribe_endpoint" the macro "RAII_VAR" is used for local variable "sub" of type "message_subscription", therefore, the decrement of the reference counter to the created object is triggered every time this function is exited. This point is not obvious and must be taken into account.
2. Considering the above, in the same function "messaging_app_subscribe_endpoint", it is necessary that the subscription reference counter must always be equal to 2 or more (in case the subscription has more than one application subscribed to), otherwise the object (subscription) will self-destruct, when function is exited, and any subsequent access to it will throw an exception (by checking the value of the "magic" field, which must be equal to a certain static value)
3. When subscribing more than one application to a subscription, it is necessary that the reference counter to the subscription increases, otherwise, when unsubscribing from the second application, it will be found that the subscription no longer exists (it self-destructed when the first application unsubscribed from it). To solve this problem, it is proposed to add the following lines of code of the function "messaging_app_subscribe_endpoint":
...
if (AST_VECTOR_APPEND(&sub->applications, tuple)) {
ast_debug(3, "TST: app name: %s failed to add to subscription \n", app_name);
ao2_ref(tuple, -1);
ao2_unlock(sub);
return -1;
}
// <= added lines begin
/* In case, subscription has more than 1 application subscribed to, we need up the ref_counter of the subscription */
if (AST_VECTOR_SIZE(&sub->applications) > 1) {
ao2_bump(sub);
}
// <= added lines end
ao2_unlock(sub);
...
4. This approach will work for application subscriptions to technology subscriptions. However, when you subscribe to a specific resource, the logic of the code changes (the subscription is searched not in the list of technology subscriptions, but in a special container). When checking it, it turned out that with each subscription of a new application to a previously created subscription to a specific resource, a new subscription was created (with the same parameters), and the previous one remained in memory. And when unsubscribing any application from a previously created subscription (and provided that only one application was signed in the subscription), the subscription was not destroyed. This leads to a memory leak.The problem turned out to be an alleged typo when passing a parameter to a macro "ao2_find". To solve this problem, it is suggested to replace the following line of code in file res/stasis/messaging.c, function "get_subscription":
...
if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
// <= replace line start
// sub = ao2_find(endpoint_subscriptions, endpoint, OBJ_SEARCH_KEY); // we need to pass a key, not object
sub = ao2_find(endpoint_subscriptions, ast_endpoint_get_id(endpoint), OBJ_SEARCH_KEY);
// <= replace line end
} else {
int i;
...
5. Also in the same function ("get_subscription") it was found that when trying to subscribe an application to a subscription with a different technology, the first created one will return, which in the future also leads to an exception. To solve this problem, it is proposed to zero the pointer to the subscription object at the end of the loop:
...
if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
// <= replace line start
// sub = ao2_find(endpoint_subscriptions, endpoint, OBJ_SEARCH_KEY); // we need to pass a key, not object
sub = ao2_find(endpoint_subscriptions, ast_endpoint_get_id(endpoint), OBJ_SEARCH_KEY);
// <= replace line end
} else {
int i;
ast_rwlock_rdlock(&tech_subscriptions_lock);
for (i = 0; i < AST_VECTOR_SIZE(&tech_subscriptions); i++) {
sub = AST_VECTOR_GET(&tech_subscriptions, i);
if (sub && !strcmp(sub->token, endpoint ? ast_endpoint_get_tech(endpoint) : TECH_WILDCARD)) {
ao2_bump(sub);
break;
}
// <= add line start
sub = NULL; /* We need to reset pointer at this line */
// <= add line end
}
ast_rwlock_unlock(&tech_subscriptions_lock);
}
...
6. After fixing the previous problems, it became possible to subscribe several applications to the same subscription of a specific resource. However, when trying to unsubscribe from the subscription of a specific resource of the last application, an exception was thrown. The problem was that when the last application unsubscribed from a subscription to a specific resource, when the macro "ao2_unlink" was called, the subscription's reference counter decreased by 1 (2-1 = 1). Then the subscription reference counter was decremented by 1 (1-1 = 0 = self-destruction of the object) again. And then, when exiting the function, the macro "RAII_VAR" tries to decrease the reference counter of the (no longer existing) subscription by 1 again, which leads to an exception.To solve this problem, it is proposed to add the following line of code to the function "messaging_app_unsubscribe_endpoint":
...
AST_VECTOR_REMOVE_CMP_UNORDERED(&sub->applications, app_name, application_tuple_cmp, ao2_cleanup);
if (AST_VECTOR_SIZE(&sub->applications) == 0) {
if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
// <= added comment start
/* NOTE: The follow call will decrease ref_counter to the subscription */
// <= added comment end
ao2_unlink(endpoint_subscriptions, sub);
// <= added lines start
/* NOTE: Because of above call will decrease ref_counter to the subscription - we'll up it again! */
ao2_bump(sub);
// <= added lines end
} else {
ast_rwlock_wrlock(&tech_subscriptions_lock);
AST_VECTOR_REMOVE_CMP_UNORDERED(&tech_subscriptions, endpoint ? ast_endpoint_get_id(endpoint) : TECH_WILDCARD,
messaging_subscription_cmp, AST_VECTOR_ELEM_CLEANUP_NOOP);
ast_rwlock_unlock(&tech_subscriptions_lock);
}
}
ao2_unlock(sub);
// <= added comment start
/* NOTE: Now it is normaly to decrease ref_counter to the subscription */
ao2_ref(sub, -1);
// <= added comment end
ast_debug(3, "App '%s' unsubscribed to messages from endpoint '%s'\n", app_name, endpoint ? ast_endpoint_get_id(endpoint) : "-- ALL --");
ast_test_suite_event_notify("StasisMessagingSubscription", "SubState: Unsubscribed\r\nAppName: %s\r\nToken: %s\r\n",
app_name, endpoint ? ast_endpoint_get_id(endpoint) : "ALL");
...
After all the previous fixes, there were no problems with subscriptions and applications. Good day
was (Author: evgenios):
Hello everybody. Most likely I found how to fix the current problem in this version. The main problem causing the exception to be thrown is related to several situations:
1. In file res/stasis/messaging.c, function "messaging_app_subscribe_endpoint" the macro "RAII_VAR" is used for local variable "sub" of type "message_subscription", therefore, the decrement of the reference counter to the created object is triggered every time this function is exited. This point is not obvious and must be taken into account.
2. Considering the above, in the same function "messaging_app_subscribe_endpoint", it is necessary that the subscription reference counter must always be equal to 2 or more (in case the subscription has more than one application subscribed to), otherwise the object (subscription) will self-destruct, when function is exited, and any subsequent access to it will throw an exception (by checking the value of the "magic" field, which must be equal to a certain static value)
3. When subscribing more than one application to a subscription, it is necessary that the reference counter to the subscription increases, otherwise, when unsubscribing from the second application, it will be found that the subscription no longer exists (it self-destructed when the first application unsubscribed from it). To solve this problem, it is proposed to add the following lines of code of the function "messaging_app_subscribe_endpoint":
...
if (AST_VECTOR_APPEND(&sub->applications, tuple)) {
ast_debug(3, "TST: app name: %s failed to add to subscription \n", app_name);
ao2_ref(tuple, -1);
ao2_unlock(sub);
return -1;
}
// <= added lines begin
/* In case, subscription has more than 1 application subscribed to, we need up the ref_counter of the subscription */
if (AST_VECTOR_SIZE(&sub->applications) > 1) {
ao2_bump(sub);
}
// <= added lines end
ao2_unlock(sub);
...
4. This approach will work for application subscriptions to technology subscriptions. However, when you subscribe to a specific resource, the logic of the code changes (the subscription is searched not in the list of technology subscriptions, but in a special container). When checking it, it turned out that with each subscription of a new application to a previously created subscription to a specific resource, a new subscription was created (with the same parameters), and the previous one remained in memory. And when unsubscribing any application from a previously created subscription (and provided that only one application was signed in the subscription), the subscription was not destroyed. This leads to a memory leak.The problem turned out to be an alleged typo when passing a parameter to a macro "ao2_find". To solve this problem, it is suggested to replace the following line of code in file res/stasis/messaging.c, function "get_subscription":
...
if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
// <= replace line start
// sub = ao2_find(endpoint_subscriptions, endpoint, OBJ_SEARCH_KEY); // we need to pass a key, not object
sub = ao2_find(endpoint_subscriptions, ast_endpoint_get_id(endpoint), OBJ_SEARCH_KEY);
// <= replace line end
} else {
int i;
...
5. Also in the same function ("get_subscription") it was found that when trying to subscribe an application to a subscription with a different technology, the first created one will return, which in the future also leads to an exception. To solve this problem, it is proposed to zero the pointer to the subscription object at the end of the loop:
...
if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
// <= replace line start
// sub = ao2_find(endpoint_subscriptions, endpoint, OBJ_SEARCH_KEY); // we need to pass a key, not object
sub = ao2_find(endpoint_subscriptions, ast_endpoint_get_id(endpoint), OBJ_SEARCH_KEY);
// <= replace line end
} else {
int i;
ast_rwlock_rdlock(&tech_subscriptions_lock);
for (i = 0; i < AST_VECTOR_SIZE(&tech_subscriptions); i++) {
sub = AST_VECTOR_GET(&tech_subscriptions, i);
if (sub && !strcmp(sub->token, endpoint ? ast_endpoint_get_tech(endpoint) : TECH_WILDCARD)) {
ao2_bump(sub);
break;
}
// <= add line start
sub = NULL; /* We need to reset pointer at this line */
// <= add line end
}
ast_rwlock_unlock(&tech_subscriptions_lock);
}
...
6. After fixing the previous problems, it became possible to subscribe several applications to the same subscription of a specific resource. However, when trying to unsubscribe from the subscription of a specific resource of the last application, an exception was thrown. The problem was that when the last application unsubscribed from a subscription to a specific resource, when the macro "ao2_unlink" was called, the subscription's reference counter decreased by 1 (2-1 = 1). Then the subscription reference counter was decremented by 1 (1-1 = 0 = self-destruction of the object) again. And then, when exiting the function, the macro "RAII_VAR" tries to decrease the reference counter of the (no longer existing) subscription by 1 again, which leads to an exception.To solve this problem, it is proposed to add the following line of code to the function "messaging_app_unsubscribe_endpoint":
...
AST_VECTOR_REMOVE_CMP_UNORDERED(&sub->applications, app_name, application_tuple_cmp, ao2_cleanup);
if (AST_VECTOR_SIZE(&sub->applications) == 0) {
if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
// <= added comment start
/* NOTE: The follow call will decrease ref_counter to the subscription */
// <= added comment end
ao2_unlink(endpoint_subscriptions, sub);
// <= added lines start
/* NOTE: Because of above call will decrease ref_counter to the subscription - we'll up it again! */
ao2_bump(sub);
// <= added lines end
} else {
ast_rwlock_wrlock(&tech_subscriptions_lock);
AST_VECTOR_REMOVE_CMP_UNORDERED(&tech_subscriptions, endpoint ? ast_endpoint_get_id(endpoint) : TECH_WILDCARD,
messaging_subscription_cmp, AST_VECTOR_ELEM_CLEANUP_NOOP);
ast_rwlock_unlock(&tech_subscriptions_lock);
}
}
ao2_unlock(sub);
// <= added comment start
/* NOTE: Now it is normaly to decrease ref_counter to the subscription */
ao2_ref(sub, -1);
// <= added comment end
ast_debug(3, "App '%s' unsubscribed to messages from endpoint '%s'\n", app_name, endpoint ? ast_endpoint_get_id(endpoint) : "-- ALL --");
ast_test_suite_event_notify("StasisMessagingSubscription", "SubState: Unsubscribed\r\nAppName: %s\r\nToken: %s\r\n",
app_name, endpoint ? ast_endpoint_get_id(endpoint) : "ALL");
...
After all the previous fixes, there were no problems with subscriptions and applications. Good day
> "FRACK!, Failed assertion bad magic number" happens when unsubscribe an application from an event source
> --------------------------------------------------------------------------------------------------------
>
> Key: ASTERISK-28237
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-28237
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Core/Stasis
> Affects Versions: 13.18.0, 16.1.1
> Reporter: Lucas Tardioli Silveira
> Labels: pjsip
> Attachments: ari.conf, ari-test2, core-brief.txt, core-full.txt, debug_frack1
>
>
> I'm facing this issue when I try to unsubscribe an application.
> Scenario:
> I have multiple applications connecting to my Asterisk server.
> All the time that my application connects to Asterisk I send a subscribe message like:
> POST /ari/applications/my-app1/subscription?eventSource=endpoint:PJSIP
> (for the other applications I can have my-app2, my-app3 and so on...)
> And when I disconnect to Asterisk I send:
> DELETE /ari/applications/my-app1/subscription?eventSource=endpoint:PJSIP
> The problem happens when I have at least two applications running and one of them needs to disconnect. So, at the moment that I send the message to unsubscribe the "FRACK" happens.
> That was happening on asterisk 13.18-cert2 then I updated it to the lastest version, 16.1.1 but it is still happening.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list