[Asterisk-code-review] res_pjsip_outbound_registration: Fix SRV failover on timeout (asterisk[certified/16.8])

Friendly Automation asteriskteam at digium.com
Tue Feb 18 14:45:49 CST 2020


Friendly Automation has submitted this change. ( https://gerrit.asterisk.org/c/asterisk/+/13789 )

Change subject: res_pjsip_outbound_registration: Fix SRV failover on timeout
......................................................................

res_pjsip_outbound_registration: Fix SRV failover on timeout

In order to retry outbound registrations for some situations, we
need access to the tdata from the original request.  For instance,
for 401/407 responses we need it to properly construct the
subsequent request with the authentication.  We also need it if
we're iterating over a DNS SRV response record set so we can skip
entries we've already tried.

We've been getting the tdata from the server response rdata and
transaction but that only works for the failures where there was
actually a response (4XX, 5XX, etc).  For timeouts there's no
response and therefore no rdata or transaction from which to get
the tdata.  When processing a single A/AAAA record for a server,
this wasn't an issue as we just retried that same server after the
retry timer expired.  If we got an SRV record set for the server
though, without the state from the tdata, we just kept trying the
first entry in the set repeatedly instead of skipping to the next
one in the list.

* Added a "last_tdata" member to the client state structure to keep
  track of the sent tdata.

* Updated registration_client_send() to save the tdata it used into
  the client_state.

* Updated sip_outbound_registration_response_cb() to use the tdata
  saved in client_state when we don't get a response from the
  server. We still use the tdata from the transaction when we DO
  get a response from the server so we can properly handle 4XX
  responses where our new request depends on it.

General note on timeouts:

Although res_pjsip_outbound_registration skips to the next record
immediately when a timeout occurs during SRV set traversal, it's
pjproject that determines how long to wait before a timeout is
declared.  As with other SIP message types, pjproject will continue
trying the same server at an interval specified by "timer_t1" until
"timer_b" expires.  Both of those timers are set in the pjsip.conf
"system" section.

ASTERISK-28746

Change-Id: I199b8274392d17661dd3ce3b4d69a3968368fa06
---
M res/res_pjsip_outbound_registration.c
1 file changed, 47 insertions(+), 2 deletions(-)

Approvals:
  Joshua Colp: Looks good to me, but someone else must approve
  Benjamin Keith Ford: Looks good to me, but someone else must approve
  Kevin Harwell: Looks good to me, but someone else must approve
  George Joseph: Looks good to me, approved
  Friendly Automation: Approved for Submit



diff --git a/res/res_pjsip_outbound_registration.c b/res/res_pjsip_outbound_registration.c
index 9ea6054..4d0e38e 100644
--- a/res/res_pjsip_outbound_registration.c
+++ b/res/res_pjsip_outbound_registration.c
@@ -334,6 +334,14 @@
 	 * module unload.
 	 */
 	pjsip_regc *client;
+	/*!
+	 * \brief Last tdata sent
+	 * We need the original tdata to resend a request on auth failure
+	 * or timeout.  On an auth failure, we use the original tdata
+	 * to initialize the new tdata for the authorized response.  On a timeout
+	 * we need it to skip failed SRV entries if any.
+	 */
+	pjsip_tx_data *last_tdata;
 	/*! \brief Timer entry for retrying on temporal responses */
 	pj_timer_entry timer;
 	/*! \brief Optional line parameter placed into Contact */
@@ -544,6 +552,13 @@
 	/* Due to the message going out the callback may now be invoked, so bump the count */
 	ao2_ref(client_state, +1);
 	/*
+	 * We also bump tdata in expectation of saving it to client_state->last_tdata.
+	 * We have to do it BEFORE pjsip_regc_send because if it succeeds, it decrements
+	 * the ref count on its own.
+	 */
+	pjsip_tx_data_add_ref(tdata);
+
+	/*
 	 * Set the transport in case transports were reloaded.
 	 * When pjproject removes the extraneous error messages produced,
 	 * we can check status and only set the transport and resend if there was an error
@@ -552,13 +567,26 @@
 	pjsip_regc_set_transport(client_state->client, &selector);
 	status = pjsip_regc_send(client_state->client, tdata);
 
-	/* If the attempt to send the message failed and the callback was not invoked we need to
-	 * drop the reference we just added
+	/*
+	 * If the attempt to send the message failed and the callback was not invoked we need to
+	 * drop the references we just added
 	 */
 	if ((status != PJ_SUCCESS) && !(*callback_invoked)) {
+		pjsip_tx_data_dec_ref(tdata);
 		ao2_ref(client_state, -1);
+		return status;
 	}
 
+	/*
+	 * Decref the old last_data before replacing it.
+	 * BTW, it's quite possible that last_data == tdata
+	 * if we're trying successive servers in an SRV set.
+	 */
+	if (client_state->last_tdata) {
+		pjsip_tx_data_dec_ref(client_state->last_tdata);
+	}
+	client_state->last_tdata = tdata;
+
 	return status;
 }
 
@@ -1077,11 +1105,25 @@
 		retry_after = pjsip_msg_find_hdr(param->rdata->msg_info.msg, PJSIP_H_RETRY_AFTER,
 			NULL);
 		response->retry_after = retry_after ? retry_after->ivalue : 0;
+
+		/*
+		 * If we got a response from the server, we have to use the tdata
+		 * from the transaction, not the tdata saved when we sent the
+		 * request.  If we use the saved tdata, we won't process responses
+		 * like 423 Interval Too Brief correctly and we'll wind up sending
+		 * the bad Expires value again.
+		 */
+		pjsip_tx_data_dec_ref(client_state->last_tdata);
+
 		tsx = pjsip_rdata_get_tsx(param->rdata);
 		response->old_request = tsx->last_tx;
 		pjsip_tx_data_add_ref(response->old_request);
 		pjsip_rx_data_clone(param->rdata, 0, &response->rdata);
+	} else {
+		/* old_request steals the reference */
+		response->old_request = client_state->last_tdata;
 	}
+	client_state->last_tdata = NULL;
 
 	/*
 	 * Transfer response reference to serializer task so the
@@ -1127,6 +1169,9 @@
 	ast_taskprocessor_unreference(client_state->serializer);
 	ast_free(client_state->transport_name);
 	ast_free(client_state->registration_name);
+	if (client_state->last_tdata) {
+		pjsip_tx_data_dec_ref(client_state->last_tdata);
+	}
 }
 
 /*! \brief Allocator function for registration state */

-- 
To view, visit https://gerrit.asterisk.org/c/asterisk/+/13789
To unsubscribe, or for help writing mail filters, visit https://gerrit.asterisk.org/settings

Gerrit-Project: asterisk
Gerrit-Branch: certified/16.8
Gerrit-Change-Id: I199b8274392d17661dd3ce3b4d69a3968368fa06
Gerrit-Change-Number: 13789
Gerrit-PatchSet: 3
Gerrit-Owner: George Joseph <gjoseph at digium.com>
Gerrit-Reviewer: Benjamin Keith Ford <bford at digium.com>
Gerrit-Reviewer: Friendly Automation
Gerrit-Reviewer: George Joseph <gjoseph at digium.com>
Gerrit-Reviewer: Joshua Colp <jcolp at sangoma.com>
Gerrit-Reviewer: Kevin Harwell <kharwell at digium.com>
Gerrit-MessageType: merged
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-code-review/attachments/20200218/6c000169/attachment-0001.html>


More information about the asterisk-code-review mailing list