[svn-commits] mmichelson: trunk r256985 - /trunk/main/ccss.c

Mon Apr 12 17:27:11 CDT 2010

Author: mmichelson
Date: Mon Apr 12 17:27:07 2010
New Revision: 256985

URL: http://svnview.digium.com/svn/asterisk?view=rev&rev=256985
Log:
Fix issue where recall would not happen when it should.

Specifically, the situation would happen when multiple
callers would request CC for a single generically-monitored
device. If the monitored device became available but the
caller did not answer the recall, then there was nothing
that would poke the CC core to let it know that it should
attempt to recall someone else instead.

After careful consideration, I came to the conclusion that
the only area of Asterisk that needed to be touched was the
generic CC monitor. All other types of CC would require something
outside of Asterisk to invoke a recall for a separate device.

This was accomplished by changing the generic monitor destructor
to poke other generic monitor instances if the device is currently
available and the specific instance was currently not suspended.

In order to not accidentally trigger recalls at bad times, the
fit_for_recall flag was also added to the generic_monitor_instance_list
struct. This gets set as soon as a monitored device becomes available.
It gets cleared if a CCNR request triggers the creation of a new
generic monitor instance. By doing this, we don't accidentally try
to recall a device when the monitored device was being monitored
for CCNR and never actually became available for recall in the first
place.

This error was discovered by Steve Pitts during in-house testing
at Digium.


Modified:
    trunk/main/ccss.c

Modified: trunk/main/ccss.c
URL: http://svnview.digium.com/svn/asterisk/trunk/main/ccss.c?view=diff&rev=256985&r1=256984&r2=256985
==============================================================================

--- trunk/main/ccss.c (original)
+++ trunk/main/ccss.c Mon Apr 12 17:27:07 2010
@@ -990,6 +990,22 @@
 struct generic_monitor_instance_list {
 	const char *device_name;
 	enum ast_device_state current_state;
+	/* If there are multiple instances monitoring the
+	 * same device and one should fail, we need to know
+	 * whether to signal that the device can be recalled.
+	 * The problem is that the device state is not enough
+	 * to check. If a caller has requested CCNR, then the
+	 * fact that the device is available does not indicate
+	 * that the device is ready to be recalled. Instead, as
+	 * soon as one instance of the monitor becomes available
+	 * for a recall, we mark the entire list as being fit
+	 * for recall. If a CCNR request comes in, then we will
+	 * have to mark the list as unfit for recall since this
+	 * is a clear indicator that the person at the monitored
+	 * device has gone away and is actuall not fit to be
+	 * recalled
+	 */
+	int fit_for_recall;
 	struct ast_event_sub *sub;
 	AST_LIST_HEAD_NOLOCK(, generic_monitor_instance) list;
 };
@@ -1112,6 +1128,7 @@
 		AST_LIST_TRAVERSE(&generic_list->list, generic_instance, next) {
 			if (!generic_instance->is_suspended && generic_instance->monitoring) {
 				generic_instance->monitoring = 0;
+				generic_list->fit_for_recall = 1;
 				ast_cc_monitor_callee_available(generic_instance->core_id, "Generic monitored party has become available");
 				break;
 			}
@@ -1208,6 +1225,12 @@
 		cc_unref(monitor, "Failed to schedule available timer. (monitor)");
 		cc_unref(generic_list, "Failed to schedule available timer. (generic_list)");
 		return -1;
+	}
+	/* If the new instance was created as CCNR, then that means this device is not currently
+	 * fit for recall even if it previously was.
+	 */
+	if (service == AST_CC_CCNR || service == AST_CC_CCNL) {
+		generic_list->fit_for_recall = 0;
 	}
 	ast_cc_monitor_request_acked(monitor->core_id, "Generic monitor for %s subscribed to device state.",
 			monitor->interface->device_name);
@@ -1343,6 +1366,27 @@
 		 * list from the container
 		 */
 		ao2_t_unlink(generic_monitors, generic_list, "Generic list is empty. Unlink it from the container");
+	} else {
+		/* There are still instances for this particular device. The situation
+		 * may be that we were attempting a CC recall and a failure occurred, perhaps
+		 * on the agent side. If a failure happens here and the device being monitored
+		 * is available, then we need to signal on the first unsuspended instance that
+		 * the device is available for recall.
+		 */
+
+		/* First things first. We don't even want to consider this action if
+		 * the device in question isn't available right now.
+		 */
+		if (generic_list->fit_for_recall && (generic_list->current_state == AST_DEVICE_NOT_INUSE ||
+				generic_list->current_state == AST_DEVICE_UNKNOWN)) {
+			AST_LIST_TRAVERSE(&generic_list->list, generic_instance, next) {
+				if (!generic_instance->is_suspended && generic_instance->monitoring) {
+					ast_cc_monitor_callee_available(generic_instance->core_id, "Signaling generic monitor "
+							"availability due to other instance's failure.");
+					break;
+				}
+			}
+		}
 	}
 	cc_unref(generic_list, "Done with generic list in generic monitor destructor");
 	ast_free((char *)gen_mon_pvt->device_name);
@@ -2848,8 +2892,6 @@
 
 static int cc_failed(struct cc_core_instance *core_instance, struct cc_state_change_args *args, enum cc_state previous_state)
 {
-	/* Something along the way failed, call agent and monitor destructor functions
-	 */
 	manager_event(EVENT_FLAG_CC, "CCFailure",
 		"CoreID: %d\r\n"
 		"Caller: %s\r\n"