[asterisk-dev] more granular control of TImer T1

Tue Nov 12 13:54:43 CST 2013

From: asterisk-dev-bounces at lists.digium.com [mailto:asterisk-dev-bounces at lists.digium.com] On Behalf Of Mark Michelson
Sent: Tuesday, November 12, 2013 12:29 PM
To: asterisk-dev at lists.digium.com
Subject: Re: [asterisk-dev] more granular control of TImer T1

On 11/12/2013 10:34 AM, Damon Estep wrote:
Originally posted to https://issues.asterisk.org/jira/browse/ASTERISK-22841

Feedback was that this is more of a dev discussion than a bug.

The definition of timer t1min is "Minimum roundtrip time for messages to monitored hosts" the key word is "monitored", which means qualify= is set to yes or a numeric value.

The value of t1min is being evaluated when the timert1 value for a non-monitored host is read from the configuration and a warning is logged and timert1 is set to global_timert1 if the configured timert1 value for the non-monitored host is less than t1min.

t1min should not apply to non-monitored hosts by definition and the user should be able to set a timer1 value that is less than t1min for a non-monitored host.

I can agree with this. The intention of the t1min setting should be to ensure that if a monitored host has a super-short roundtrip time that we do not end up setting the t1 timer to something ridiculously low. If you want to set the default t1 timer to something lower than t1min for non-monitored hosts, while I think it's a bit on the strange side, I think it should be allowed.

That appears to be the initial intention, yet global_t1min is checked and set even when maxms and lastms variables are not defined for a host. The resulting behavior is that you MUST lower the t1min for all monitored peers if you want to lower it for a non-monitored peer. In our case we want to lower it for a carrier peer to reduce timerB so call routes can advance to the next peer in a reasonable amount of time (before the ISDN network times out).

If T1 is set in the config for a monitored host it should be used instead of the last qualify result.

I'm not quite so sure on this one. I imagine there are users of Asterisk out there that set timer t1 to some initial "base" value and then expect qualifies to "correct" that value if it turns out the RTT is greater or less than what is expected. Ignoring RTT for qualifies seems like something that, if it should be done at all, should be optional.

I see both sides of this argument, my thinking is that you would not define timert1 for a peer unless you needed to modify it. If not set it would be the greater of lastms or t1min, if set it would be the peer setting.

My thinking is:

T1min = minimum T1 for MONITORED host, not evaluated anywhere else.
Global_Timert1 = default for non-monitored peers with nor peer_timert1 setting
Peer_timert1 = absolute peer_timert1 regardless of global_timert1 or global_t1min

A case where this is needed is when there is a SBC in front of the monitored host. The SBC may answer the OPTIONS query directly and not pass it to the host which is behind the SBC. In this case the lastms is set to the RTT to the SBC, which is not indicative of the actual RTT for an INVITE which is proxied to the host (which adds additional delay).

In this particular case, wouldn't the SBC also be the first responder to the INVITE (with a 100 Trying)? The way Asterisk works, once it has received a response to a request, it will not retransmit the request any further.

No, every SBC I have ever used (including Acme and Sonus) answers OPTIONS and NOTIFY with a response generated by the SBC, not a response proxied to the host behind the SBC. INVITEs are always proxied to the host behind the SBC. The difference in RTT for OPTIONS/NOTIFY vs INVITE depends on the network latency between the SBC and host, as well as response time of the host. In production we see 10ms OPTION/NOTIFY responses and 50ms INVITE responses for the same SBC/host pair.

Setting T1 to the last measured OPTIONS RTT is a problematic for another reason as well. Any amount of jitter in the network can result in unneeded retransmits. In practice T1 is set to the estimated RTT which is not always the last RTT but rather the average RTT plus an allowance for network jitter. There are many reasons when a user might want to adjust timer T1 on a peer by peer basis. They should not be forced to accept a global minimum or system calculated T1 regardless of whether qualify is on. Qualify is used in other ways by many users, not just as a means of setting T1.

The current timer configuration could be improved by adding a T1 jitter configuration value (in addition to or in place of T1min, which would be added to lastms to set T1. For example, if the lastms for a peer was 40ms and the T1jitter setting was 20, then T1 would be set to 60ms.

This, on the other hand, sounds like something I can agree with. Setting T1 to a running average plus some jitter tolerance makes sense to me.

And it is the spirit of the RFC and generally accepted practice.

Our application requires us to patch this to interop correctly. Our platform consists of many asterisk clusters service tens of thousands of users and connected to dozens of carriers. We need to be ability to control timer1 (and subsequently timerB) on an individual host basis. Granted the qualify/t1min strategy works for the majority of hosts. T1min is not optimal. Lastms+50ms would be optimal as it would set a sane T1 for each peer instead of one that is a compromise for most hosts.

Ultimately I think the enforcement of t1min on non-monitored peers is not the intended behavior and should be changed, and the rest are merely suggestions for improvement.

.

Comments or thoughts?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20131112/3f98ac75/attachment-0001.html>