[asterisk-bugs] [JIRA] (ASTERISK-30381) Using unbound, queries do not try all available nameservers, and contacts will flap
Mark Murawski (JIRA)
noreply at issues.asterisk.org
Wed Dec 28 20:57:06 CST 2022
Mark Murawski created ASTERISK-30381:
----------------------------------------
Summary: Using unbound, queries do not try all available nameservers, and contacts will flap
Key: ASTERISK-30381
URL: https://issues.asterisk.org/jira/browse/ASTERISK-30381
Project: Asterisk
Issue Type: Bug
Security Level: None
Components: Resources/res_resolver_unbound
Affects Versions: 20.0.1, 19.7.1, 18.15.1
Reporter: Mark Murawski
Using what's probably a fairly standard DNS server list containing a local DNS server and some backups, using the unbound DNS resolver will result in non-deterministic lookup failures.
Given resolv.conf:
{{monospaced}}
options attempts:3 timeout:1
nameserver 192.168.5.2
nameserver 4.2.2.2
nameserver 8.8.8.8
{{monospaced}}
Given resolver_unbound.conf
{{monospaced}}
[general]
hosts = /etc/hosts
resolv = /etc/resolv.conf
{{monospaced}}
Given pjsip_wizard.conf
{{monospaced}}
[foo]
type = wizard
remote_hosts = foo.vpn.lan
---snip--- ... other settings here
{{monospaced}}
You wind up with contacts flapping in reachability due to DNS but not due to lack of SIP OPTIONS. (The foo.vpn.lan host was responding to SIP OPTIONS this entire time, but we had intermittent DNS failures):
{{monospaced}}
Contact wombat/sip:foo.vpn.lan is now Reachable. RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable. RTT: 0.000 msec
Contact wombat/sip:foo.vpn.lan is now Reachable. RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable. RTT: 0.000 msec
Contact wombat/sip:foo.vpn.lan is now Reachable. RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable. RTT: 0.000 msec
{{monospaced}}
The reason for this is two fold:
Unbound does not query more than one DNS server to get the result for a given request.
Unbound does not respect the order of DNS servers in /etc/resolv.conf
Unbound debug logging shows the dns server order:
{{monospaced}}
[pid 10346] write(2, "[1672280502] libunbound[8890:0] info: DelegationPoint<.>: 0 names (0 missing), 3 addrs (0 result, 3 avail) parentNS\n", 116) = 116
[pid 10346] getpid() = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: ip4 8.8.8.8 port 53 (len 16)\n", 71) = 71
[pid 10346] getpid() = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: ip4 4.2.2.2 port 53 (len 16)\n", 71) = 71
[pid 10346] getpid() = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: ip4 192.168.5.2 port 53 (len 16)\n", 75) = 75
[pid 10346] getpid() = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: attempt to get extra 3 targets\n", 70) = 70
{{monospaced}}
Take this example:
{{monospaced}}
Timestamp 12:00:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:01:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:02:00: DNS Lookup foo.vpn.lan using 192.168.5.2 .. success! endpoint dns is stored, host is marked reachable
Timestamp 12:03:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:04:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
{{monospaced}}
If you change resolver_unbound.conf to the following:
{{monospaced}}
[general]
hosts = /etc/hosts
nameserver = 192.168.5.2
{{monospaced}}
This does not fix the issue. Unbound does not respect this as the full nameserver list and still uses /etc/resolv.conf for the 3 nameservers specified
The ideal behavior here would be:
1) Don't treat a contact as unreachable if the DNS suddenly fails, but SIP OPTIONS is still working to the last-known IP
2) Try all DNS servers until we get a successful lookup, or all servers have failed lookups
The only workaround for this is to noload res_resolver_unbound.so
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list