[asterisk-bugs] [JIRA] (ASTERISK-28171) res_resolver_unbound: DNS issue when under load, can't make outgoing calls

Sean Bright (JIRA) noreply at issues.asterisk.org
Wed Oct 23 08:08:48 CDT 2019


    [ https://issues.asterisk.org/jira/browse/ASTERISK-28171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=248508#comment-248508 ] 

Sean Bright edited comment on ASTERISK-28171 at 10/23/19 8:08 AM:
------------------------------------------------------------------

OK. So here is what is going on...

{{libunbound2 1.5.8-1ubuntu1.1}} on Xenial _does not_ use libevent. You can confirm this by running {{apt show libunbound2}} and see that the {{Depends:}} line does not contain {{libevent}}. So this means that {{libunbound2}} will instead use an internal {{libevent}}-like implementation called {{mini_event}} which they describe as:

bq. fake libevent implementation. Less broad in functionality, and only supports select(2).

So I did a bit of digging in to the source code of the {{libunbound2}} v1.5.8 package and determined that:

* The only way that {{event_add}} can fail (as indicated in the logs) is due to this statement: {{if(ev->ev_fd != \-1 && ev->ev_fd >= ev->ev_base->capfd)}}
* ...and {{ev->ev_base->capfd}} is set to {{MAX_FDS}} which is hard-coded as {{1024}}

This means that once file descriptor {{1024}} is allocated (anywhere in the asterisk process, not just in {{libunbound2}}'s code), {{event_add}} will start to fail with the error you are seeing.

A couple options:

# Build and install {{libunbound2}} from source, making sure to specify {{--with-libevent}} when calling its {{configure}}
# Upgrade to 18.04 Bionic or better (the {{libunbound2}} package in Bionic uses {{libevent}})

In either case, this is not an Asterisk bug. Sorry that it took so long to determine that.


was (Author: seanbright):
OK. So here is what is going on...

{{libunbound2 1.5.8-1ubuntu1.1}} on Xenial _does not_ use libevent. You can confirm this by running {{apt show libunbound2}} and see that the {{Depends:}} line does not contain {{libevent}}. So this means that {{libunbound2}} will instead use an internal {{libevent}}-like implementation called {{mini_event}} which they describe as:

bq. fake libevent implementation. Less broad in functionality, and only supports select(2).

So I did a bit of digging in to the source code of the {{libunbound2}} v1.5.8 package and determined that:

* The only way that {{event_add}} can fail (as indicated in the logs) is due to this statement: {{if(ev->ev_fd != \-1 && ev->ev_fd >= ev->ev_base->capfd)}}
* ...and {{ev->ev_base->capfd}} is set to {{MAX_FDS}} which is hard-coded as {{1024}}

This means that once file descriptor {{1024}} is allocated (anywhere in the asterisk process, not just in {{libunbound2}}'s code), {{event_add}} will start to fail with the error you are seeing.

A couple options:

1. Build and install {{libunbound2}} from source, making sure to specify {{--with-libevent}} when calling its {{configure}}
2. Upgrade to 18.04 Bionic or better (the {{libunbound2}} package in Bionic uses {{libevent}})

In either case, this is not an Asterisk bug. Sorry that it took so long to determine that.

> res_resolver_unbound: DNS issue when under load, can't make outgoing calls
> --------------------------------------------------------------------------
>
>                 Key: ASTERISK-28171
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28171
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_resolver_unbound
>    Affects Versions: 15.4.0, 16.0.0, 16.6.1
>         Environment: Linux, Ubuntu 16.04 (kernel 4.4.0-1060-aws)
>            Reporter: Cyril Ramière
>            Assignee: Unassigned
>              Labels: pjsip
>         Attachments: asterisk.zip, console.log, fd.txt, locks.txt, taskprocessor.txt, threads.txt
>
>
> Hello,
> We have an issue with Asterisk, let me try to explain the setup and the problem:
> h5. Setup
> An instance on AWS (C5.2XLARGE) : 8CPU & 16GB RAM.
> Ubuntu 16.04, kernel 4.4.
> Tried with Amazon Linux 2 AMI and got same issue. 
> Only asterisk installed (compile from src, tested on v15.4.0 & 16.0).
> Using PJSIP (bundled).
> h5. The problem
> Problem is with outgoing calls (from US to our trunk), our trunk is set with an URL (so it uses DNS, that's very important).
> When Asterisk is processing hundreds of calls (400, 500), and we try to make an outbound call using our trunk, it doesn't work.
> We checked the configuration and everything seems fine, we replaced the instance & recompiled asterisk, same thing.
> We tested on the last Asterisk 16.0, same thing.
> We are running out of ideas, the instance is really big and can handle easily the load, if we use an IP instead of the DNS for our trunk, it works, so problem seems DNS related.
> We checked everything we can, there are no DNS issues on the instance, resolution is absolutely fine.
> I'm attaching the configuration that we use, and the outputs (console).
> There are some messages related to the unbound library.
> Ulimits are fine.
> Seems that realtime is not part of the issue, for the sake of simplicity I removed the realtime configuration and kept a light configuration.
> h5. Steps to reproduce
> Start asterisk
> Use SIPP to call extension 1000 with 400/500 calls and wait for the calls to be handled.
> Try to make an outgoing call through the "outgoing" endpoint
> Please tell me if you want more information/logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list