[asterisk-bugs] [JIRA] (ASTERISK-30381) Using unbound, queries do not try all available nameservers, and contacts will flap

Mark Murawski (JIRA) noreply at issues.asterisk.org
Wed Dec 28 21:07:06 CST 2022


     [ https://issues.asterisk.org/jira/browse/ASTERISK-30381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Murawski updated ASTERISK-30381:
-------------------------------------

    Description: 
Using what's probably a fairly standard DNS server list containing a local DNS server and some backups, using the unbound DNS resolver  will result in non-deterministic lookup failures.

Given resolv.conf:
{code}
options attempts:3 timeout:1
nameserver 192.168.5.2
nameserver 4.2.2.2
nameserver 8.8.8.8
{code}

Given resolver_unbound.conf
{code}
[general]
hosts = /etc/hosts
resolv = /etc/resolv.conf
{code}

Given pjsip_wizard.conf
{code}
[wombat]
type = wizard
remote_hosts = foo.vpn.lan
aor/qualify_frequency = 60
aor/qualify_timeout = 2000
{code}

You wind up with contacts flapping in reachability due to DNS but not due to lack of SIP OPTIONS.  (The foo.vpn.lan host was responding to SIP OPTIONS this entire time, but we had intermittent DNS failures):
{code}
Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
{code}

The reason for this is two fold:
Unbound does not query more than one DNS server to get the result for a given request.
Unbound does not respect the order of DNS servers in /etc/resolv.conf

Unbound debug logging shows the dns server order:
{code}
[pid 10346] write(2, "[1672280502] libunbound[8890:0] info: DelegationPoint<.>: 0 names (0 missing), 3 addrs (0 result, 3 avail) parentNS\n", 116) = 116
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 8.8.8.8 port 53 (len 16)\n", 71) = 71
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 4.2.2.2 port 53 (len 16)\n", 71) = 71
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 192.168.5.2 port 53 (len 16)\n", 75) = 75
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: attempt to get extra 3 targets\n", 70) = 70
{code}


Take this example:
{code}
Timestamp 12:00:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:01:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:02:00: DNS Lookup foo.vpn.lan using 192.168.5.2 .. success! endpoint dns is stored, host is marked reachable
Timestamp 12:03:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:04:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
{code}

If you change resolver_unbound.conf to the following:
{code}
[general]
hosts = /etc/hosts
nameserver = 192.168.5.2
{code}

This does not fix the issue.  Unbound does not respect this as the full nameserver list and still uses /etc/resolv.conf for the 3 nameservers specified


The ideal behavior here would be:
1) Don't treat a contact as unreachable if the DNS suddenly fails, but SIP OPTIONS is still working to the last-known IP
2) Try all DNS servers until we get a successful lookup, or all servers have failed lookups


The only workaround for this is to noload res_resolver_unbound.so

  was:
Using what's probably a fairly standard DNS server list containing a local DNS server and some backups, using the unbound DNS resolver  will result in non-deterministic lookup failures.

Given resolv.conf:
{code}
options attempts:3 timeout:1
nameserver 192.168.5.2
nameserver 4.2.2.2
nameserver 8.8.8.8
{code}

Given resolver_unbound.conf
{code}
[general]
hosts = /etc/hosts
resolv = /etc/resolv.conf
{code}

Given pjsip_wizard.conf
{code}
[wombat]
type = wizard
remote_hosts = foo.vpn.lan
---snip--- ... other settings here
{code}

You wind up with contacts flapping in reachability due to DNS but not due to lack of SIP OPTIONS.  (The foo.vpn.lan host was responding to SIP OPTIONS this entire time, but we had intermittent DNS failures):
{code}
Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
{code}

The reason for this is two fold:
Unbound does not query more than one DNS server to get the result for a given request.
Unbound does not respect the order of DNS servers in /etc/resolv.conf

Unbound debug logging shows the dns server order:
{code}
[pid 10346] write(2, "[1672280502] libunbound[8890:0] info: DelegationPoint<.>: 0 names (0 missing), 3 addrs (0 result, 3 avail) parentNS\n", 116) = 116
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 8.8.8.8 port 53 (len 16)\n", 71) = 71
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 4.2.2.2 port 53 (len 16)\n", 71) = 71
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 192.168.5.2 port 53 (len 16)\n", 75) = 75
[pid 10346] getpid()                    = 8890
[pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: attempt to get extra 3 targets\n", 70) = 70
{code}


Take this example:
{code}
Timestamp 12:00:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:01:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:02:00: DNS Lookup foo.vpn.lan using 192.168.5.2 .. success! endpoint dns is stored, host is marked reachable
Timestamp 12:03:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
Timestamp 12:04:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
{code}

If you change resolver_unbound.conf to the following:
{code}
[general]
hosts = /etc/hosts
nameserver = 192.168.5.2
{code}

This does not fix the issue.  Unbound does not respect this as the full nameserver list and still uses /etc/resolv.conf for the 3 nameservers specified


The ideal behavior here would be:
1) Don't treat a contact as unreachable if the DNS suddenly fails, but SIP OPTIONS is still working to the last-known IP
2) Try all DNS servers until we get a successful lookup, or all servers have failed lookups


The only workaround for this is to noload res_resolver_unbound.so


> Using unbound, queries do not try all available nameservers, and contacts will flap
> -----------------------------------------------------------------------------------
>
>                 Key: ASTERISK-30381
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-30381
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_resolver_unbound
>    Affects Versions: 18.15.1, 19.7.1, 20.0.1
>            Reporter: Mark Murawski
>
> Using what's probably a fairly standard DNS server list containing a local DNS server and some backups, using the unbound DNS resolver  will result in non-deterministic lookup failures.
> Given resolv.conf:
> {code}
> options attempts:3 timeout:1
> nameserver 192.168.5.2
> nameserver 4.2.2.2
> nameserver 8.8.8.8
> {code}
> Given resolver_unbound.conf
> {code}
> [general]
> hosts = /etc/hosts
> resolv = /etc/resolv.conf
> {code}
> Given pjsip_wizard.conf
> {code}
> [wombat]
> type = wizard
> remote_hosts = foo.vpn.lan
> aor/qualify_frequency = 60
> aor/qualify_timeout = 2000
> {code}
> You wind up with contacts flapping in reachability due to DNS but not due to lack of SIP OPTIONS.  (The foo.vpn.lan host was responding to SIP OPTIONS this entire time, but we had intermittent DNS failures):
> {code}
> Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
> Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
> Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
> Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
> Contact wombat/sip:foo.vpn.lan is now Reachable.  RTT: 37.946 msec
> Contact wombat/sip:foo.vpn.lan is now Unreachable.  RTT: 0.000 msec
> {code}
> The reason for this is two fold:
> Unbound does not query more than one DNS server to get the result for a given request.
> Unbound does not respect the order of DNS servers in /etc/resolv.conf
> Unbound debug logging shows the dns server order:
> {code}
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] info: DelegationPoint<.>: 0 names (0 missing), 3 addrs (0 result, 3 avail) parentNS\n", 116) = 116
> [pid 10346] getpid()                    = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 8.8.8.8 port 53 (len 16)\n", 71) = 71
> [pid 10346] getpid()                    = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 4.2.2.2 port 53 (len 16)\n", 71) = 71
> [pid 10346] getpid()                    = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug:    ip4 192.168.5.2 port 53 (len 16)\n", 75) = 75
> [pid 10346] getpid()                    = 8890
> [pid 10346] write(2, "[1672280502] libunbound[8890:0] debug: attempt to get extra 3 targets\n", 70) = 70
> {code}
> Take this example:
> {code}
> Timestamp 12:00:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> Timestamp 12:01:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> Timestamp 12:02:00: DNS Lookup foo.vpn.lan using 192.168.5.2 .. success! endpoint dns is stored, host is marked reachable
> Timestamp 12:03:00: DNS Lookup foo.vpn.lan using 4.2.2.2 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> Timestamp 12:04:00: DNS Lookup foo.vpn.lan using 8.8.8.8 .. fails due to vpn.lan only exists on 192.168.5.2... local cached dns for endpoint contact is deleted, host marked unreachable
> {code}
> If you change resolver_unbound.conf to the following:
> {code}
> [general]
> hosts = /etc/hosts
> nameserver = 192.168.5.2
> {code}
> This does not fix the issue.  Unbound does not respect this as the full nameserver list and still uses /etc/resolv.conf for the 3 nameservers specified
> The ideal behavior here would be:
> 1) Don't treat a contact as unreachable if the DNS suddenly fails, but SIP OPTIONS is still working to the last-known IP
> 2) Try all DNS servers until we get a successful lookup, or all servers have failed lookups
> The only workaround for this is to noload res_resolver_unbound.so



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list