[Asterisk-code-review] res pjsip pubsub: Address SEGV when attempting to terminate ... (asterisk[certified/13.8])

George Joseph asteriskteam at digium.com
Tue Jun 21 14:32:51 CDT 2016


George Joseph has uploaded a new change for review.

  https://gerrit.asterisk.org/3061

Change subject: res_pjsip_pubsub: Address SEGV when attempting to terminate a subscription
......................................................................

res_pjsip_pubsub: Address SEGV when attempting to terminate a subscription

Occasionally under load we'll attempt to send a final NOTIFY on a
subscription that's already been terminated and a SEGV will occur
down in pjproject's evsub_destroy function.  This is a result of a
race condition between all the paths that can generate a notify
and/or destroy the underlying pjproject evsub object:

 * The client can send a SUBSCRIBE with Expires: 0.
 * The client can send a SUBSCRIBE/refresh.
 * The subscription timer can expire.
 * An extension state can change.
 * An MWI event can be generated.
 * The pjproject transaction timer (timer_b) can expire.

Normally when our pubsub_on_evsub_state is called with a terminate,
we push a task to the serializer and return at which point the dialog
is unlocked.  This is usually not a problem because the task runs
immediately and locks the dialog again.  When the system is heavily
loaded though, there may be a delay between the unlock and relock
during which another event may occur such as the subscription timer
or timer_b expiring, an extension state change, etc.  These may also
cause a terminate to be processed and if so, we could cause pjproject
to try to destroy the evsub structure twice.  There's no way for us to
tell that the evsub was already destroyed and the evsub's group lock
can't tolerate this and SEGVs.

The remedy is twofold.

 * A patch has been submitted to Teluu and added to the bundled
   pjproject which adds add/decrement operations on evsub's group lock.

 * In res_pjsip_pubsub:
   * configure.ac and pjproject-bundled's configure.m4 were updated
     to check for the new evsub group lock APIs.
   * We now add a reference to the evsub group lock when we create
     the subscription and remove the reference when we clean up the
     subscription.  This prevents evsub from being destroyed before
     we're done with it.
   * A state has been added to the subscription tree structure so
     termination progress can be tracked through the asyncronous tasks.
   * The pubsub_on_evsub_state callback has been split so it's not doing
     double duty.  It now only handles the final cleanup of the
     subscription tree.  pubsub_on_rx_refresh now handles both client
     refreshes and client terminates.  It was always being called for
     both anyway.
   * The serialized_on_server_timeout task was removed since
     serialized_pubsub_on_rx_refresh was almost identical.
   * Missing state checks and ao2_cleanups were added.
   * Some debug levels were adjusted to make seeing only off-nominal
     things at level 1 and nominal or progress things at level 2+.

ASTERISK-26099 #close
Reported-by: Ross Beer.

Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
---
M configure
M configure.ac
M include/asterisk/autoconfig.h.in
M res/res_pjsip_pubsub.c
M third-party/pjproject/configure.m4
A third-party/pjproject/patches/0001-evsub-Add-APIs-to-add-decrement-an-event-subscriptio.patch
6 files changed, 393 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.asterisk.org:29418/asterisk refs/changes/61/3061/1

diff --git a/configure b/configure
index 14c5a0e..b4db561 100755
--- a/configure
+++ b/configure
@@ -911,6 +911,10 @@
 POPT_DIR
 POPT_INCLUDE
 POPT_LIB
+PBX_PJSIP_EVSUB_GRP_LOCK
+PJSIP_EVSUB_GRP_LOCK_DIR
+PJSIP_EVSUB_GRP_LOCK_INCLUDE
+PJSIP_EVSUB_GRP_LOCK_LIB
 PBX_PJSIP_TLS_TRANSPORT_PROTO
 PJSIP_TLS_TRANSPORT_PROTO_DIR
 PJSIP_TLS_TRANSPORT_PROTO_INCLUDE
@@ -10546,6 +10550,18 @@
 PJSIP_TLS_TRANSPORT_PROTO_DIR=${PJPROJECT_DIR}
 
 PBX_PJSIP_TLS_TRANSPORT_PROTO=0
+
+
+
+
+
+
+
+PJSIP_EVSUB_GRP_LOCK_DESCRIP="PJSIP EVSUB Group Lock support"
+PJSIP_EVSUB_GRP_LOCK_OPTION=pjsip
+PJSIP_EVSUB_GRP_LOCK_DIR=${PJPROJECT_DIR}
+
+PBX_PJSIP_EVSUB_GRP_LOCK=0
 
 
 
@@ -24453,6 +24469,9 @@
 $as_echo "#define HAVE_PJSIP_TLS_TRANSPORT_PROTO 1" >>confdefs.h
 
 
+$as_echo "#define HAVE_PJSIP_EVSUB_GRP_LOCK 1" >>confdefs.h
+
+
    else
 
    if test "x${PBX_PJPROJECT}" != "x1" -a "${USE_PJPROJECT}" != "no"; then
@@ -25168,6 +25187,111 @@
 
       LIBS="${saved_libs}"
       CPPFLAGS="${saved_cppflags}"
+
+
+if test "x${PBX_PJSIP_EVSUB_GRP_LOCK}" != "x1" -a "${USE_PJSIP_EVSUB_GRP_LOCK}" != "no"; then
+   pbxlibdir=""
+   # if --with-PJSIP_EVSUB_GRP_LOCK=DIR has been specified, use it.
+   if test "x${PJSIP_EVSUB_GRP_LOCK_DIR}" != "x"; then
+      if test -d ${PJSIP_EVSUB_GRP_LOCK_DIR}/lib; then
+         pbxlibdir="-L${PJSIP_EVSUB_GRP_LOCK_DIR}/lib"
+      else
+         pbxlibdir="-L${PJSIP_EVSUB_GRP_LOCK_DIR}"
+      fi
+   fi
+   pbxfuncname="pjsip_evsub_add_lock"
+   if test "x${pbxfuncname}" = "x" ; then   # empty lib, assume only headers
+      AST_PJSIP_EVSUB_GRP_LOCK_FOUND=yes
+   else
+      ast_ext_lib_check_save_CFLAGS="${CFLAGS}"
+      CFLAGS="${CFLAGS} $PJPROJECT_CFLAGS"
+      as_ac_Lib=`$as_echo "ac_cv_lib_pjsip_${pbxfuncname}" | $as_tr_sh`
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ${pbxfuncname} in -lpjsip" >&5
+$as_echo_n "checking for ${pbxfuncname} in -lpjsip... " >&6; }
+if eval \${$as_ac_Lib+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lpjsip ${pbxlibdir} $PJPROJECT_LIB $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ${pbxfuncname} ();
+int
+main ()
+{
+return ${pbxfuncname} ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  eval "$as_ac_Lib=yes"
+else
+  eval "$as_ac_Lib=no"
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+eval ac_res=\$$as_ac_Lib
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+if eval test \"x\$"$as_ac_Lib"\" = x"yes"; then :
+  AST_PJSIP_EVSUB_GRP_LOCK_FOUND=yes
+else
+  AST_PJSIP_EVSUB_GRP_LOCK_FOUND=no
+fi
+
+      CFLAGS="${ast_ext_lib_check_save_CFLAGS}"
+   fi
+
+   # now check for the header.
+   if test "${AST_PJSIP_EVSUB_GRP_LOCK_FOUND}" = "yes"; then
+      PJSIP_EVSUB_GRP_LOCK_LIB="${pbxlibdir} -lpjsip $PJPROJECT_LIB"
+      # if --with-PJSIP_EVSUB_GRP_LOCK=DIR has been specified, use it.
+      if test "x${PJSIP_EVSUB_GRP_LOCK_DIR}" != "x"; then
+         PJSIP_EVSUB_GRP_LOCK_INCLUDE="-I${PJSIP_EVSUB_GRP_LOCK_DIR}/include"
+      fi
+      PJSIP_EVSUB_GRP_LOCK_INCLUDE="${PJSIP_EVSUB_GRP_LOCK_INCLUDE} $PJPROJECT_CFLAGS"
+      if test "xpjsip.h" = "x" ; then	# no header, assume found
+         PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND="1"
+      else				# check for the header
+         ast_ext_lib_check_saved_CPPFLAGS="${CPPFLAGS}"
+         CPPFLAGS="${CPPFLAGS} ${PJSIP_EVSUB_GRP_LOCK_INCLUDE}"
+         ac_fn_c_check_header_mongrel "$LINENO" "pjsip.h" "ac_cv_header_pjsip_h" "$ac_includes_default"
+if test "x$ac_cv_header_pjsip_h" = xyes; then :
+  PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND=1
+else
+  PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND=0
+fi
+
+
+         CPPFLAGS="${ast_ext_lib_check_saved_CPPFLAGS}"
+      fi
+      if test "x${PJSIP_EVSUB_GRP_LOCK_HEADER_FOUND}" = "x0" ; then
+         PJSIP_EVSUB_GRP_LOCK_LIB=""
+         PJSIP_EVSUB_GRP_LOCK_INCLUDE=""
+      else
+         if test "x${pbxfuncname}" = "x" ; then		# only checking headers -> no library
+            PJSIP_EVSUB_GRP_LOCK_LIB=""
+         fi
+         PBX_PJSIP_EVSUB_GRP_LOCK=1
+         cat >>confdefs.h <<_ACEOF
+#define HAVE_PJSIP_EVSUB_GRP_LOCK 1
+_ACEOF
+
+      fi
+   fi
+fi
+
+
    fi
 fi
 
diff --git a/configure.ac b/configure.ac
index 7571f7f..d67b1a0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -487,6 +487,7 @@
 AST_EXT_LIB_SETUP_OPTIONAL([PJ_SSL_CERT_LOAD_FROM_FILES2], [pj_ssl_cert_load_from_files2 support], [PJPROJECT], [pjsip])
 AST_EXT_LIB_SETUP_OPTIONAL([PJSIP_EXTERNAL_RESOLVER], [PJSIP External Resolver Support], [PJPROJECT], [pjsip])
 AST_EXT_LIB_SETUP_OPTIONAL([PJSIP_TLS_TRANSPORT_PROTO], [PJSIP TLS Transport proto field support], [PJPROJECT], [pjsip])
+AST_EXT_LIB_SETUP_OPTIONAL([PJSIP_EVSUB_GRP_LOCK], [PJSIP EVSUB Group Lock support], [PJPROJECT], [pjsip])
 
 AST_EXT_LIB_SETUP([POPT], [popt], [popt])
 AST_EXT_LIB_SETUP([PORTAUDIO], [PortAudio], [portaudio])
@@ -2194,6 +2195,8 @@
       AST_C_COMPILE_CHECK([PJSIP_TLS_TRANSPORT_PROTO], [struct pjsip_tls_setting setting; int proto; proto = setting.proto;], [pjsip.h])
       LIBS="${saved_libs}"
       CPPFLAGS="${saved_cppflags}"
+
+      AST_EXT_LIB_CHECK([PJSIP_EVSUB_GRP_LOCK], [pjsip], [pjsip_evsub_add_lock], [pjsip.h], [$PJPROJECT_LIB], [$PJPROJECT_CFLAGS])
    fi
 fi
 
diff --git a/include/asterisk/autoconfig.h.in b/include/asterisk/autoconfig.h.in
index 8078098..a8651e4 100644
--- a/include/asterisk/autoconfig.h.in
+++ b/include/asterisk/autoconfig.h.in
@@ -587,6 +587,9 @@
 /* Define if your system has pjsip_dlg_create_uas_and_inc_lock declared. */
 #undef HAVE_PJSIP_DLG_CREATE_UAS_AND_INC_LOCK
 
+/* Define if your system has PJSIP_EVSUB_GRP_LOCK */
+#undef HAVE_PJSIP_EVSUB_GRP_LOCK
+
 /* Define if your system has pjsip_endpt_set_ext_resolver declared. */
 #undef HAVE_PJSIP_EXTERNAL_RESOLVER
 
diff --git a/res/res_pjsip_pubsub.c b/res/res_pjsip_pubsub.c
index 0761fe8..ae75ce7 100644
--- a/res/res_pjsip_pubsub.c
+++ b/res/res_pjsip_pubsub.c
@@ -378,6 +378,20 @@
 };
 
 /*!
+ * \brief The state of the subscription tree
+ */
+enum sip_subscription_tree_state {
+	/*! Normal operation */
+	SIP_SUB_TREE_NORMAL = 0,
+	/*! A terminate has been requested by Asterisk, the client, or pjproject */
+	SIP_SUB_TREE_TERMINATE_PENDING,
+	/*! The terminate is in progress */
+	SIP_SUB_TREE_TERMINATE_IN_PROGRESS,
+	/*! The terminate process has finished and the subscription tree is no longer valid */
+	SIP_SUB_TREE_TERMINATED,
+};
+
+/*!
  * \brief A tree of SIP subscriptions
  *
  * Because of the ability to subscribe to resource lists, a SIP
@@ -411,8 +425,8 @@
 	int is_list;
 	/*! Next item in the list */
 	AST_LIST_ENTRY(sip_subscription_tree) next;
-	/*! Indicates that a NOTIFY is currently being sent on the SIP subscription */
-	int last_notify;
+	/*! Subscription tree state */
+	enum sip_subscription_tree_state state;
 };
 
 /*!
@@ -879,15 +893,15 @@
 							"allocation error afterwards\n", resource);
 					continue;
 				}
-				ast_debug(1, "Subscription to leaf resource %s resulted in success. Adding to parent %s\n",
+				ast_debug(2, "Subscription to leaf resource %s resulted in success. Adding to parent %s\n",
 						resource, parent->resource);
 				AST_VECTOR_APPEND(&parent->children, current);
 			} else {
-				ast_debug(1, "Subscription to leaf resource %s resulted in error response %d\n",
+				ast_debug(2, "Subscription to leaf resource %s resulted in error response %d\n",
 						resource, resp);
 			}
 		} else {
-			ast_debug(1, "Resource %s (child of %s) is a list\n", resource, parent->resource);
+			ast_debug(2, "Resource %s (child of %s) is a list\n", resource, parent->resource);
 			current = tree_node_alloc(resource, visited, child_list->full_state);
 			if (!current) {
 				ast_debug(1, "Cannot build children of resource %s due to allocation failure\n", resource);
@@ -898,7 +912,7 @@
 				ast_debug(1, "List %s had no successful children.\n", resource);
 				AST_VECTOR_APPEND(&parent->children, current);
 			} else {
-				ast_debug(1, "List %s had successful children. Adding to parent %s\n",
+				ast_debug(2, "List %s had successful children. Adding to parent %s\n",
 						resource, parent->resource);
 				tree_node_destroy(current);
 			}
@@ -970,7 +984,7 @@
 	struct resources visited;
 
 	if (!has_eventlist_support || !(list = retrieve_resource_list(resource, handler->event_name))) {
-		ast_debug(1, "Subscription to resource %s is not to a list\n", resource);
+		ast_debug(2, "Subscription to resource %s is not to a list\n", resource);
 		tree->root = tree_node_alloc(resource, NULL, 0);
 		if (!tree->root) {
 			return 500;
@@ -978,7 +992,7 @@
 		return handler->notifier->new_subscribe(endpoint, resource);
 	}
 
-	ast_debug(1, "Subscription to resource %s is a list\n", resource);
+	ast_debug(2, "Subscription to resource %s is a list\n", resource);
 	if (AST_VECTOR_INIT(&visited, AST_VECTOR_SIZE(&list->items))) {
 		return 500;
 	}
@@ -1037,7 +1051,7 @@
 		if (i == obj) {
 			AST_RWLIST_REMOVE_CURRENT(next);
 			if (i->root) {
-				ast_debug(1, "Removing subscription to resource %s from list of subscriptions\n",
+				ast_debug(2, "Removing subscription to resource %s from list of subscriptions\n",
 						ast_sip_subscription_get_resource_name(i->root));
 			}
 			break;
@@ -1328,6 +1342,10 @@
 
 	pjsip_evsub_create_uas(dlg, &pubsub_cb, rdata, 0, &sub_tree->evsub);
 	subscription_setup_dialog(sub_tree, dlg);
+
+#ifdef HAVE_PJSIP_EVSUB_GRP_LOCK
+	pjsip_evsub_add_ref(sub_tree->evsub);
+#endif
 
 	ast_sip_mod_data_set(dlg->pool, dlg->mod_data, pubsub_module.id, MOD_DATA_MSG,
 			pjsip_msg_clone(dlg->pool, rdata->msg_info.msg));
@@ -2246,10 +2264,8 @@
 		pjsip_msg_add_hdr(tdata->msg, (pjsip_hdr *) require);
 	}
 
-	if (sub_tree->root->subscription_state == PJSIP_EVSUB_STATE_TERMINATED) {
-		sub_tree->last_notify = 1;
-	}
 	if (sip_subscription_send_request(sub_tree, tdata)) {
+		pjsip_tx_data_dec_ref(tdata);
 		return -1;
 	}
 
@@ -2264,21 +2280,32 @@
 	pjsip_dialog *dlg = sub_tree->dlg;
 
 	pjsip_dlg_inc_lock(dlg);
+
 	/* It's possible that between when the notification was scheduled
-	 * and now, that a new SUBSCRIBE arrived, requiring full state to be
-	 * sent out in an immediate NOTIFY. If that has happened, we need to
+	 * and now a new SUBSCRIBE arrived requiring full state to be
+	 * sent out in an immediate NOTIFY. It's also possible that we're
+	 * already processing a terminate.  If that has happened, we need to
 	 * bail out here instead of sending the batched NOTIFY.
 	 */
-	if (!sub_tree->send_scheduled_notify) {
+
+	if (sub_tree->state >= SIP_SUB_TREE_TERMINATE_IN_PROGRESS
+		|| !sub_tree->send_scheduled_notify) {
 		pjsip_dlg_dec_lock(dlg);
 		ao2_cleanup(sub_tree);
 		return 0;
 	}
 
+	if (sub_tree->root->subscription_state == PJSIP_EVSUB_STATE_TERMINATED) {
+		sub_tree->state = SIP_SUB_TREE_TERMINATE_IN_PROGRESS;
+	}
+
 	send_notify(sub_tree, 0);
-	ast_test_suite_event_notify("SUBSCRIPTION_STATE_CHANGED",
-			"Resource: %s",
-			sub_tree->root->resource);
+
+	ast_test_suite_event_notify(
+		sub_tree->state == SIP_SUB_TREE_TERMINATED
+			? "SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_STATE_CHANGED",
+		"Resource: %s", sub_tree->root->resource);
+
 	sub_tree->notify_sched_id = -1;
 	pjsip_dlg_dec_lock(dlg);
 	ao2_cleanup(sub_tree);
@@ -2290,7 +2317,10 @@
 	struct sip_subscription_tree *sub_tree = (struct sip_subscription_tree *) data;
 
 	/* We don't need to bump the refcount of sub_tree since we bumped it when scheduling this task */
-	ast_sip_push_task(sub_tree->serializer, serialized_send_notify, sub_tree);
+	if (ast_sip_push_task(sub_tree->serializer, serialized_send_notify, sub_tree)) {
+		ao2_cleanup(sub_tree);
+	}
+
 	return 0;
 }
 
@@ -2301,12 +2331,13 @@
 		return 0;
 	}
 
+	sub_tree->send_scheduled_notify = 1;
 	sub_tree->notify_sched_id = ast_sched_add(sched, sub_tree->notification_batch_interval, sched_cb, ao2_bump(sub_tree));
 	if (sub_tree->notify_sched_id < 0) {
+		ao2_cleanup(sub_tree);
 		return -1;
 	}
 
-	sub_tree->send_scheduled_notify = 1;
 	return 0;
 }
 
@@ -2318,7 +2349,7 @@
 
 	pjsip_dlg_inc_lock(dlg);
 
-	if (!sub->tree->evsub) {
+	if (sub->tree->state != SIP_SUB_TREE_NORMAL) {
 		pjsip_dlg_dec_lock(dlg);
 		return 0;
 	}
@@ -2332,6 +2363,7 @@
 	sub->body_changed = 1;
 	if (terminate) {
 		sub->subscription_state = PJSIP_EVSUB_STATE_TERMINATED;
+		sub->tree->state = SIP_SUB_TREE_TERMINATE_PENDING;
 	}
 
 	if (sub->tree->notification_batch_interval) {
@@ -2339,6 +2371,9 @@
 	} else {
 		/* See the note in pubsub_on_rx_refresh() for why sub->tree is refbumped here */
 		ao2_ref(sub->tree, +1);
+		if (terminate) {
+			sub->tree->state = SIP_SUB_TREE_TERMINATE_IN_PROGRESS;
+		}
 		res = send_notify(sub->tree, 0);
 		ast_test_suite_event_notify(terminate ? "SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_STATE_CHANGED",
 				"Resource: %s",
@@ -3308,71 +3343,72 @@
 	}
 }
 
-/* XXX This function and serialized_pubsub_on_rx_refresh are nearly identical */
-static int serialized_pubsub_on_server_timeout(void *userdata)
-{
-	struct sip_subscription_tree *sub_tree = userdata;
-	pjsip_dialog *dlg = sub_tree->dlg;
-
-	pjsip_dlg_inc_lock(dlg);
-	if (!sub_tree->evsub) {
-		pjsip_dlg_dec_lock(dlg);
-		return 0;
-	}
-	set_state_terminated(sub_tree->root);
-	send_notify(sub_tree, 1);
-	ast_test_suite_event_notify("SUBSCRIPTION_TERMINATED",
-			"Resource: %s",
-			sub_tree->root->resource);
-
-	pjsip_dlg_dec_lock(dlg);
-	ao2_cleanup(sub_tree);
-	return 0;
-}
+/*!
+ * \brief Callback sequence for subscription terminate:
+ *
+ * * Client initiated:
+ *     pjproject receives SUBSCRIBE on the subscription's serializer thread
+ *         calls pubsub_on_rx_refresh with dialog locked
+ *             pubsub_on_rx_refresh sets TERMINATE_PENDING
+ *             pushes serialized_pubsub_on_refresh_timeout
+ *             returns to pjproject
+ *         pjproject calls pubsub_on_evsub_state
+ *             pubsub_evsub_set_state checks state == TERMINATE_IN_PROGRESS (no)
+ *             ignore and return
+ *         pjproject unlocks dialog
+ *     serialized_pubsub_on_refresh_timeout starts (1)
+ *       locks dialog
+ *       checks state == TERMINATE_PENDING
+ *       sets TERMINATE_IN_PROGRESS
+ *       calls send_notify (2)
+ *           send_notify ultimately calls pjsip_evsub_send_request
+ *               pjsip_evsub_send_request calls evsub's set_state
+ *                   set_state calls pubsub_evsub_set_state
+ *                       pubsub_evsub_set_state checks state == TERMINATE_IN_PROGRESS
+ *                       removes the subscriptions
+ *                       cleans up references to evsub
+ *                       sets state = TERMINATED
+ *       serialized_pubsub_on_refresh_timeout unlocks dialog
+ *
+ * * Subscription timer expires:
+ *     pjproject timer expires
+ *         locks dialog
+ *         calls pubsub_on_server_timeout
+ *             pubsub_on_server_timeout checks state == NORMAL
+ *             sets TERMINATE_PENDING
+ *             pushes serialized_pubsub_on_refresh_timeout
+ *             returns to pjproject
+ *         pjproject unlocks dialog
+ *     serialized_pubsub_on_refresh_timeout starts
+ *         See (1) Above
+ *
+ *
+ * * ast_sip_subscription_notify is called
+ *       checks state == NORMAL
+ *       if not batched...
+ *           sets TERMINATE_IN_PROGRESS (if terminate is requested)
+ *           calls send_notify
+ *               See (2) Above
+ *       if batched...
+ *           sets TERMINATE_PENDING
+ *           schedules task
+ *       scheduler runs sched_task
+ *           sched_task pushes serialized_send_notify
+ *       serialized_send_notify starts
+ *           checks state <= TERMINATE_PENDING
+ *           if state == TERMINATE_PENDING set state = TERMINATE_IN_PROGRESS
+ *           call send_notify
+ *               See (2) Above
+ *
+ */
 
 /*!
  * \brief PJSIP callback when underlying SIP subscription changes state
  *
- * This callback is a bit of a mess, because it's not always called when
- * you might expect it to be, and it can be called multiple times for the
- * same state.
- *
- * For instance, this function is not called at all when an incoming SUBSCRIBE
- * arrives to refresh a subscription. That makes sense in a way, since the
- * subscription state has not made a change; it was active and remains active.
- *
- * However, if an incoming SUBSCRIBE arrives to end a subscription, then this
- * will be called into once upon receiving the SUBSCRIBE (after the call to
- * pubsub_on_rx_refresh) and again when sending a NOTIFY to end the subscription.
- * In both cases, the apparent state of the subscription is "terminated".
- *
- * However, the double-terminated state changes don't happen in all cases. For
- * instance, if a subscription expires, then the only time this callback is
- * called is when we send the NOTIFY to end the subscription.
- *
- * As far as state changes are concerned, we only ever care about transitions
- * to the "terminated" state. The action we take here is dependent on the
- * conditions behind why the state change to "terminated" occurred. If the
- * state change has occurred because we are sending a NOTIFY to end the
- * subscription, we consider this to be the final hurrah of the subscription
- * and take measures to start shutting things down. If the state change to
- * terminated occurs for a different reason (e.g. transaction timeout,
- * incoming SUBSCRIBE to end the subscription), then we push a task to
- * send out a NOTIFY. When that NOTIFY is sent, this callback will be
- * called again and we will actually shut down the subscription. The
- * subscription tree's last_notify field let's us know if this is being
- * called as a result of a terminating NOTIFY or not.
- *
- * There is no guarantee that this function will be called from a serializer
- * thread since it can be called due to a transaction timeout. Therefore
- * synchronization primitives are necessary to ensure that no operations
- * step on each others' toes. The dialog lock is always held when this
- * callback is called, so we ensure that relevant structures that may
- * be touched in this function are always protected by the dialog lock
- * elsewhere as well. The dialog lock in particular protects
- *
- * \li The subscription tree's last_notify field
- * \li The subscription tree's evsub pointer
+ * Although this function is called for every state change, we only care
+ * about the TERMINATED state, and only when we're actually processing the final
+ * notify (SIP_SUB_TREE_TERMINATE_IN_PROGRESS).  In this case, we do all
+ * the subscription tree cleanup tasks and decrement the evsub reference.
  */
 static void pubsub_on_evsub_state(pjsip_evsub *evsub, pjsip_event *event)
 {
@@ -3385,51 +3421,55 @@
 	}
 
 	sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
-	if (!sub_tree) {
+	if (!sub_tree || sub_tree->state != SIP_SUB_TREE_TERMINATE_IN_PROGRESS) {
+		ast_debug(1, "Possible terminate race prevented %p\n", sub_tree);
 		return;
 	}
 
-	if (!sub_tree->last_notify) {
-		if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_server_timeout, ao2_bump(sub_tree))) {
-			ast_log(LOG_ERROR, "Failed to push task to send final NOTIFY.\n");
-			ao2_ref(sub_tree, -1);
-		} else {
-			return;
-		}
-	}
-
 	remove_subscription(sub_tree);
+
 	pjsip_evsub_set_mod_data(evsub, pubsub_module.id, NULL);
+
+#ifdef HAVE_PJSIP_EVSUB_GRP_LOCK
+	pjsip_evsub_dec_ref(sub_tree->evsub);
+#endif
+
 	sub_tree->evsub = NULL;
+
 	ast_sip_dialog_set_serializer(sub_tree->dlg, NULL);
 	ast_sip_dialog_set_endpoint(sub_tree->dlg, NULL);
+
 	subscription_persistence_remove(sub_tree);
 	shutdown_subscriptions(sub_tree->root);
 
+	sub_tree->state = SIP_SUB_TREE_TERMINATED;
 	/* Remove evsub's reference to the sub_tree */
 	ao2_ref(sub_tree, -1);
 }
 
-static int serialized_pubsub_on_rx_refresh(void *userdata)
+static int serialized_pubsub_on_refresh_timeout(void *userdata)
 {
 	struct sip_subscription_tree *sub_tree = userdata;
 	pjsip_dialog *dlg = sub_tree->dlg;
 
 	pjsip_dlg_inc_lock(dlg);
-	if (!sub_tree->evsub) {
+	if (sub_tree->state >= SIP_SUB_TREE_TERMINATE_IN_PROGRESS) {
+		ast_debug(1, "Possible terminate race prevented %p %d\n", sub_tree->evsub, sub_tree->state);
 		pjsip_dlg_dec_lock(dlg);
+		ao2_cleanup(sub_tree);
 		return 0;
 	}
 
-	if (pjsip_evsub_get_state(sub_tree->evsub) == PJSIP_EVSUB_STATE_TERMINATED) {
+	if (sub_tree->state == SIP_SUB_TREE_TERMINATE_PENDING) {
+		sub_tree->state = SIP_SUB_TREE_TERMINATE_IN_PROGRESS;
 		set_state_terminated(sub_tree->root);
 	}
 
 	send_notify(sub_tree, 1);
 
 	ast_test_suite_event_notify(sub_tree->root->subscription_state == PJSIP_EVSUB_STATE_TERMINATED ?
-			"SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_REFRESHED",
-			"Resource: %s", sub_tree->root->resource);
+				"SUBSCRIPTION_TERMINATED" : "SUBSCRIPTION_REFRESHED",
+				"Resource: %s", sub_tree->root->resource);
 
 	pjsip_dlg_dec_lock(dlg);
 	ao2_cleanup(sub_tree);
@@ -3442,10 +3482,8 @@
  * This includes both SUBSCRIBE requests that actually refresh the subscription
  * as well as SUBSCRIBE requests that end the subscription.
  *
- * In the case where the SUBSCRIBE is actually refreshing the subscription we
- * push a task to send an appropriate NOTIFY request. In the case where the
- * SUBSCRIBE is ending the subscription, we let the pubsub_on_evsub_state
- * callback take care of sending the terminal NOTIFY request instead.
+ * In either case we push serialized_pubsub_on_refresh_timeout to send an
+ * appropriate NOTIFY request.
  */
 static void pubsub_on_rx_refresh(pjsip_evsub *evsub, pjsip_rx_data *rdata,
 		int *p_st_code, pj_str_t **p_st_text, pjsip_hdr *res_hdr, pjsip_msg_body **p_body)
@@ -3453,18 +3491,24 @@
 	struct sip_subscription_tree *sub_tree;
 
 	sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
-	if (!sub_tree) {
+	if (!sub_tree || sub_tree->state != SIP_SUB_TREE_NORMAL) {
+		ast_debug(1, "Possible terminate race prevented %p %d\n", sub_tree, sub_tree ? sub_tree->state : -1 );
 		return;
 	}
 
 	/* PJSIP will set the evsub's state to terminated before calling into this function
 	 * if the Expires value of the incoming SUBSCRIBE is 0.
 	 */
-	if (pjsip_evsub_get_state(sub_tree->evsub) != PJSIP_EVSUB_STATE_TERMINATED) {
-		if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_rx_refresh, ao2_bump(sub_tree))) {
-			/* If we can't push the NOTIFY refreshing task...we'll just go with it. */
-			ao2_ref(sub_tree, -1);
-		}
+
+	if (pjsip_evsub_get_state(sub_tree->evsub) == PJSIP_EVSUB_STATE_TERMINATED) {
+		sub_tree->state = SIP_SUB_TREE_TERMINATE_PENDING;
+	}
+
+	if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_refresh_timeout, ao2_bump(sub_tree))) {
+		/* If we can't push the NOTIFY refreshing task...we'll just go with it. */
+		ast_log(LOG_ERROR, "Failed to push task to send NOTIFY.\n");
+		sub_tree->state = SIP_SUB_TREE_NORMAL;
+		ao2_ref(sub_tree, -1);
 	}
 
 	if (sub_tree->is_list) {
@@ -3475,9 +3519,9 @@
 static void pubsub_on_rx_notify(pjsip_evsub *evsub, pjsip_rx_data *rdata, int *p_st_code,
 		pj_str_t **p_st_text, pjsip_hdr *res_hdr, pjsip_msg_body **p_body)
 {
-	struct ast_sip_subscription *sub = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
+	struct ast_sip_subscription *sub;
 
-	if (!sub) {
+	if (!(sub = pjsip_evsub_get_mod_data(evsub, pubsub_module.id))) {
 		return;
 	}
 
@@ -3490,45 +3534,62 @@
 	struct sip_subscription_tree *sub_tree = userdata;
 	pjsip_tx_data *tdata;
 
+	if (!sub_tree->evsub) {
+		ao2_cleanup(sub_tree);
+		return 0;
+	}
+
 	if (pjsip_evsub_initiate(sub_tree->evsub, NULL, -1, &tdata) == PJ_SUCCESS) {
 		pjsip_evsub_send_request(sub_tree->evsub, tdata);
 	} else {
 		pjsip_evsub_terminate(sub_tree->evsub, PJ_TRUE);
 	}
+
 	ao2_cleanup(sub_tree);
 	return 0;
 }
 
 static void pubsub_on_client_refresh(pjsip_evsub *evsub)
 {
-	struct sip_subscription_tree *sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
+	struct sip_subscription_tree *sub_tree;
 
-	ao2_ref(sub_tree, +1);
-	ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_client_refresh, sub_tree);
+	if (!(sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id))) {
+		return;
+	}
+
+	if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_client_refresh, ao2_bump(sub_tree))) {
+		ao2_cleanup(sub_tree);
+	}
 }
 
 static void pubsub_on_server_timeout(pjsip_evsub *evsub)
 {
+	struct sip_subscription_tree *sub_tree;
 
-	struct sip_subscription_tree *sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
-	if (!sub_tree) {
-		/* PJSIP does not terminate the server timeout timer when a SUBSCRIBE
-		 * with Expires: 0 arrives to end a subscription, nor does it terminate
-		 * this timer when we send a NOTIFY request in response to receiving such
-		 * a SUBSCRIBE. PJSIP does not stop the server timeout timer until the
-		 * NOTIFY transaction has finished (either through receiving a response
-		 * or through a transaction timeout).
-		 *
-		 * Therefore, it is possible that we can be told that a server timeout
-		 * occurred after we already thought that the subscription had been
-		 * terminated. In such a case, we will have already removed the sub_tree
-		 * from the evsub's mod_data array.
-		 */
+	/* PJSIP does not terminate the server timeout timer when a SUBSCRIBE
+	 * with Expires: 0 arrives to end a subscription, nor does it terminate
+	 * this timer when we send a NOTIFY request in response to receiving such
+	 * a SUBSCRIBE. PJSIP does not stop the server timeout timer until the
+	 * NOTIFY transaction has finished (either through receiving a response
+	 * or through a transaction timeout).
+	 *
+	 * Therefore, it is possible that we can be told that a server timeout
+	 * occurred after we already thought that the subscription had been
+	 * terminated. In such a case, we will have already removed the sub_tree
+	 * from the evsub's mod_data array.
+	 */
+
+	sub_tree = pjsip_evsub_get_mod_data(evsub, pubsub_module.id);
+	if (!sub_tree || sub_tree->state != SIP_SUB_TREE_NORMAL) {
+		ast_debug(1, "Possible terminate race prevented %p %d\n", sub_tree, sub_tree ? sub_tree->state : -1 );
         return;
 	}
 
-	ao2_ref(sub_tree, +1);
-	ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_server_timeout, sub_tree);
+	sub_tree->state = SIP_SUB_TREE_TERMINATE_PENDING;
+	if (ast_sip_push_task(sub_tree->serializer, serialized_pubsub_on_refresh_timeout, ao2_bump(sub_tree))) {
+		sub_tree->state = SIP_SUB_TREE_NORMAL;
+		ao2_cleanup(sub_tree);
+	}
 }
 
 static int ami_subscription_detail(struct sip_subscription_tree *sub_tree,
diff --git a/third-party/pjproject/configure.m4 b/third-party/pjproject/configure.m4
index 7432227..d477b90 100644
--- a/third-party/pjproject/configure.m4
+++ b/third-party/pjproject/configure.m4
@@ -44,4 +44,5 @@
 	PJPROJECT_SYMBOL_CHECK([PJ_SSL_CERT_LOAD_FROM_FILES2], [pj_ssl_cert_load_from_files2], [pjlib.h])
 	PJPROJECT_SYMBOL_CHECK([PJSIP_EXTERNAL_RESOLVER], [pjsip_endpt_set_ext_resolver], [pjsip.h])
 	AC_DEFINE([HAVE_PJSIP_TLS_TRANSPORT_PROTO], 1, [Define if your system has PJSIP_TLS_TRANSPORT_PROTO])
+	AC_DEFINE([HAVE_PJSIP_EVSUB_GRP_LOCK], 1, [Define if your system has PJSIP_EVSUB_GRP_LOCK])
 ])
diff --git a/third-party/pjproject/patches/0001-evsub-Add-APIs-to-add-decrement-an-event-subscriptio.patch b/third-party/pjproject/patches/0001-evsub-Add-APIs-to-add-decrement-an-event-subscriptio.patch
new file mode 100644
index 0000000..d2a47c6
--- /dev/null
+++ b/third-party/pjproject/patches/0001-evsub-Add-APIs-to-add-decrement-an-event-subscriptio.patch
@@ -0,0 +1,73 @@
+From a5030c9b33b2c936879fbacb1d2ea5edc2979181 Mon Sep 17 00:00:00 2001
+From: George Joseph <gjoseph at digium.com>
+Date: Sat, 18 Jun 2016 10:14:34 -0600
+Subject: [PATCH] evsub:  Add APIs to add/decrement an event  subscription's
+ group lock
+
+These APIs can be used to ensure that the evsub isn't destroyed before
+an application is finished using it.
+---
+ pjsip/include/pjsip-simple/evsub.h | 20 ++++++++++++++++++++
+ pjsip/src/pjsip-simple/evsub.c     | 14 ++++++++++++++
+ 2 files changed, 34 insertions(+)
+
+diff --git a/pjsip/include/pjsip-simple/evsub.h b/pjsip/include/pjsip-simple/evsub.h
+index 2dc4d69..31f85f8 100644
+--- a/pjsip/include/pjsip-simple/evsub.h
++++ b/pjsip/include/pjsip-simple/evsub.h
+@@ -490,6 +490,26 @@ PJ_DECL(void) pjsip_evsub_set_mod_data( pjsip_evsub *sub, unsigned mod_id,
+ PJ_DECL(void*) pjsip_evsub_get_mod_data( pjsip_evsub *sub, unsigned mod_id );
+ 
+ 
++/**
++ * Increment the event subscription's group lock.
++ *
++ * @param sub		The server subscription instance.
++ *
++ * @return		PJ_SUCCESS on success.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_add_ref(pjsip_evsub *sub);
++
++
++/**
++ * Decrement the event subscription's group lock.
++ *
++ * @param sub		The server subscription instance.
++ *
++ * @return		PJ_SUCCESS on success.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_dec_ref(pjsip_evsub *sub);
++
++
+ 
+ PJ_END_DECL
+ 
+diff --git a/pjsip/src/pjsip-simple/evsub.c b/pjsip/src/pjsip-simple/evsub.c
+index 7cd8859..68a9564 100644
+--- a/pjsip/src/pjsip-simple/evsub.c
++++ b/pjsip/src/pjsip-simple/evsub.c
+@@ -831,7 +831,21 @@ static pj_status_t evsub_create( pjsip_dialog *dlg,
+     return PJ_SUCCESS;
+ }
+ 
++/*
++ * Increment the event subscription's group lock.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_add_ref(pjsip_evsub *sub)
++{
++    return pj_grp_lock_add_ref(sub->grp_lock);
++}
+ 
++/*
++ * Decrement the event subscription's group lock.
++ */
++PJ_DEF(pj_status_t) pjsip_evsub_dec_ref(pjsip_evsub *sub)
++{
++    return pj_grp_lock_dec_ref(sub->grp_lock);
++}
+ 
+ /*
+  * Create client subscription session.
+-- 
+2.5.5
+

-- 
To view, visit https://gerrit.asterisk.org/3061
To unsubscribe, visit https://gerrit.asterisk.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I779d11802cf672a51392e62a74a1216596075ba1
Gerrit-PatchSet: 1
Gerrit-Project: asterisk
Gerrit-Branch: certified/13.8
Gerrit-Owner: George Joseph <gjoseph at digium.com>



More information about the asterisk-code-review mailing list