[asterisk-dev] Refreshing rtp.conf stunaddr IP address

Thu Feb 18 03:29:29 CST 2021

On 2/17/21 8:45 PM, sduthil at wazo.io wrote:
<snip>
> I've found a related Asterisk issue with similar symptoms, but a
> different cause (WAN unavailable):
>
> https://issues.asterisk.org/jira/browse/ASTERISK-22745
>
> I'm willing to propose a patch in Asterisk to avoid the delay when the
> STUN server changes its IP address. I'm wondering what is the best
> strategy to make Asterisk resolve the stunaddr again. Here are the
> solutions that I've come up with:
>
> 1. Resolve the stunaddr hostname at every call
>
> This is the strategy used for turnaddr.
> This adds another chance of timeout when placing a call, e.g. if the DNS
> resolver is unavailable.
> This also adds a delay for every call, i.e. the time for the stunaddr to
> be resolved to an IP address.
>
> 2. Keep the stunaddr cache in memory, and refresh it after the first timeout
>
> This strategy is used in res_stun_monitor.
>
> 3. Keep the stunaddr cache in memory, and refresh it periodically
>
> What would be an acceptable default refresh frequency?
>
> 4. Keep the stunaddr cache in memory, and refresh it after the DNS
> response TTL
>
> AFAIK, this requires making an explicit DNS query, instead of relying on
> the OS name-resolving facilites like getaddrinfo. Maybe
> res_resolver_unbound could be used there? Is it a good idea to add a
> dependency from res_rtp_asterisk to res_resolver_unbound? Make the
> dependency optional with a configuration flag e.g.
> "stunaddr_resolve_frequency=auto" (default="once")?
>
> 5. Some program (either an Asterisk module thread or some external
> process) continuously checks the IP address of the STUN server and runs
>   "module reload res_rtp_asterisk.so" when the IP address changes.
>
> This is more of a crutch than a real solution.
>
> My preference goes to solution 4, and if not possible, then solution 2.
>
> My questions, then:
>
> Do you know of any discussion about this topic?
> What are your preferences regarding a solution?
> Do you have better strategies to propose?
> Does solution 4 go in the right direction?
> Would it be better to have the same strategy for stunaddr and turnaddr
> (currently solution 1)?
>
Not entirely related but I made patch a while back for chan_sip in order 
to solve a DNS issue with qualify (NOTIFY). In our case, some customers 
had a flaky LAN/WAN connection which was causing DNS failure and 
subsequently made Asterisk believe peers were offline. I recall DNS 
resolution in chan_sip being performed on reload only, exacerbating the 
problem. The patch I wrote made it so that chan_sip would attempt DNS 
resolution "later" for any unresolved addresses, skipping qualify until 
resolved. As qualify is performed regularly, any DNS problems eventually 
solved themselves and in doing so made for happy customers. The tricky 
part was getting all the locking and ref-counting right as the peer 
address and peer itself are used for lots of things other than qualify.

Considering this pattern, I would add an option 6 (similar to 5):

Instead of dead STUN servers being discovered at call setup, one could 
periodically check to see if the server is alive (STUN qualify?) and 
refresh the address as needed. The 10-second delay you are seeing would 
still be unavoidable if/when the STUN server dies just prior to call 
setup, but at least the the probability of such delays will be reduced 
due to early discovery. Respecting TTL would be nice to have. Not 
performing checks for recently-used STUN servers to reduce network spam 
would also be nice to have. In addition to above, reducing the timeout 
to 2-3 seconds and failing the call on timeout might be a better caller 
experience. By the time the caller redials, network issues will have 
resolved themselves.

-- 
Dennis Buteyn
Xorcom Ltd