[asterisk-bugs] [JIRA] (ASTERISK-30381) res_resolver_unbound: Using unbound, queries do not try all available nameservers, and contacts will flap
Joshua C. Colp (JIRA)
noreply at issues.asterisk.org
Thu Dec 29 03:55:06 CST 2022
[ https://issues.asterisk.org/jira/browse/ASTERISK-30381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=261086#comment-261086 ]
Joshua C. Colp commented on ASTERISK-30381:
-------------------------------------------
{code}
[general]
hosts = /etc/hosts
nameserver = 192.168.5.2
{code}
This doesn't do what you think it does. The default setting[1] for "resolv" is "system" which will be /etc/resolv.conf therefore you would need to set it to an empty value to have it not be used. Does this work?
For resolv.conf it's not a list of primary and backup. Modern DNS clients will round robin/balance across the given list, failing over if a nameserver is unreachable. Explicitly configuring the list using "nameserver" will do primary/backup.
I also fundamentally disagree with your DNS configuration. A list of name servers should have all addresses resolvable, not some, otherwise you have to do everything you're asking for - have the resolver client try to solve your problem, and if the client doesn't behave exactly as you need then this happens. What you're actually doing is split DNS, which should be done by a local caching server that is configured to send queries to the appropriate place based on the domain.
I don't know what your "1" means. As for "2" that's completely dependent on the unbound DNS client library. While the ability to set further configuration isn't implemented in res_resolver_unbound we can at least look to see if that would be a viable option[2][3]. Looking through the options I don't see any which would cause it to behave as you describe in "2".
[1] https://github.com/asterisk/asterisk/blob/18/configs/samples/resolver_unbound.conf.sample#L10
[2] https://unbound.docs.nlnetlabs.nl/en/latest/manpages/libunbound.html
[3] https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html
> res_resolver_unbound: Using unbound, queries do not try all available nameservers, and contacts will flap
> ---------------------------------------------------------------------------------------------------------
>
> Key: ASTERISK-30381
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-30381
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Resources/res_resolver_unbound
> Affects Versions: 18.15.1, 19.7.1, 20.0.1
> Reporter: Mark Murawski
>
> Using what's probably a fairly standard DNS server list containing a local DNS server and some backups, using the unbound DNS resolver will result in non-deterministic lookup failures.
> Given resolv.conf:
> {code}
> options attempts:3 timeout:1
> nameserver 192.168.5.2
> nameserver 4.2.2.2
> nameserver 8.8.8.8
> {code}
> Given resolver_unbound.conf
> {code}
> [general]
> hosts = /etc/hosts
> resolv = /etc/resolv.conf
> {code}
> Given pjsip_wizard.conf
> {code}
> [wombat]
> type = wizard
> remote_hosts = foo.vpn.lan
> aor/qualify_frequency = 60
> aor/qualify_timeout = 2000
> {code}
> You wind up with contacts flapping in reachability due to DNS but not due to lack of SIP OPTIONS. (The foo.vpn.lan host was responding to SIP OPTIONS this entire time, but we had intermittent DNS failures):
> {code}
> Contact wombat/sip:foo.vpn.lan is now Reachable. RTT: 37.946 msec
> Contact wombat/sip:foo.vpn.lan is now Unreachable. RTT: 0.000 msec
> Contact wombat/sip:foo.vpn.lan is now Reachable. RTT: 37.946 msec
> Contact wombat/sip:foo.vpn.lan is now Unreachable. RTT: 0.000 msec
> Contact wombat/sip:foo.vpn.lan is now Reachable. RTT: 37.946 msec
> Contact wombat/sip:foo.vpn.lan is now Unreachable. RTT: 0.000 msec
> {code}
> The reason for this is two fold:
> Unbound does not query more than one DNS server to get the result for a given request.
> Unbound does not respect the order of DNS servers in /etc/resolv.conf
> Unbound debug logging shows the dns server order:
> {code}
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] info: DelegationPoint<.>: 0 names (0 missing), 3 addrs (0 result, 3 avail) parentNS\n", 116) = 116
> [pid 10346] getpid() = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: ip4 8.8.8.8 port 53 (len 16)\n", 71) = 71
> [pid 10346] getpid() = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: ip4 4.2.2.2 port 53 (len 16)\n", 71) = 71
> [pid 10346] getpid() = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: ip4 192.168.5.2 port 53 (len 16)\n", 75) = 75
> [pid 10346] getpid() = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: attempt to get extra 3 targets\n", 70) = 70
> {code}
> Take this example:
> {code}
> Timestamp 12:00:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> Timestamp 12:01:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> Timestamp 12:02:00: DNS Lookup foo.vpn.lan using 192.168.5.2 .. success! endpoint dns is stored, host is marked reachable
> Timestamp 12:03:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> Timestamp 12:04:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> {code}
> If you change resolver_unbound.conf to the following:
> {code}
> [general]
> hosts = /etc/hosts
> nameserver = 192.168.5.2
> {code}
> This does not fix the issue. Unbound does not respect this as the full nameserver list and still uses /etc/resolv.conf for the 3 nameservers specified
> The ideal behavior here would be:
> 1) Don't treat a contact as unreachable if the DNS suddenly fails, but SIP OPTIONS is still working to the last-known IP
> 2) Try all DNS servers until we get a successful lookup, or all servers have failed lookups
> The only workaround for this is to noload res_resolver_unbound.so
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list