[asterisk-bugs] [JIRA] (ASTERISK-26903) Listening TCP/TLS sockets stop when temporarily out of open files

Walter Doekes (JIRA) noreply at issues.asterisk.org
Sun Apr 2 04:24:10 CDT 2017


    [ https://issues.asterisk.org/jira/browse/ASTERISK-26903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=236245#comment-236245 ] 

Walter Doekes commented on ASTERISK-26903:
------------------------------------------

All recent versions: 13, 14, master

If I diff ast_tcptls_server_root between 11 and master, I see only the WARNING->ERROR change as relevant change.

The listening socket (accept_fd) is first closed when ast_tcptls_server_stop() is called, which generally is only on module/usage shutdown or if the config changes from enabled to disabled. In the mean time, a thread stopped because of this problem would sit in zombie state because no one is actively monitoring its state.

Ah, doing some git blamage, I find that the break/exit was introduced first in 2015.
{noformat}
commit c7591ef6bc6ad4c4e1c7f6a66de78b6ff70dc913
Author: Kevin Harwell <kharwell at digium.com>
Date:   Tue Jan 27 22:58:44 2015 +0000

    tcptls: Bad file descriptor error when reloading chan_sip
...
    valid. This is probably happening because unloading of chan_sip is not atomic.
    That however is outside the scope of this patch. This patch simply stops the
    logging of multiple occurrences of that message.
...
    ASTERISK-24728
{noformat}

> Listening TCP/TLS sockets stop when temporarily out of open files
> -----------------------------------------------------------------
>
>                 Key: ASTERISK-26903
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26903
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/General
>            Reporter: Walter Doekes
>
> Just now a misconfigured Asterisk lost its AMI socket because it ran out of open files.
> Asterisk uses ast_tcptls_server_root for several TCP/TLS listening sockets. If an accept() there fails, the entire listening thread just stops without more than a WARNING. (Okay, ERROR on Asterisk 13+.)
> Example:
> {noformat}
> [2017-03-29 06:34:50] ERROR[6513] cel_custom.c: Unable to re-open master file /var/log/asterisk/cel-custom/full.csv : Too many open files
> [2017-03-29 06:34:50] WARNING[6519] tcptls.c: Accept failed: Too many open files
> [2017-03-29 06:34:50] ERROR[6513] cel_custom.c: Unable to re-open master file /var/log/asterisk/cel-custom/full.csv : Too many open files
> {noformat}
> At 06:34:50 AMI was trying to accept() an incoming connection. It failed here:
> {code}
> void *ast_tcptls_server_root(void *data)
> {
> // ...
>         for (;;) {
> // ...
>                 fd = ast_accept(desc->accept_fd, &addr);
>                 if (fd < 0) {
>                         if ((errno != EAGAIN) && (errno != EWOULDBLOCK) && (errno != EINTR) && (errno != ECONNABORTED)) {
>                                 ast_log(LOG_WARNING, "Accept failed: %s\n", strerror(errno));
>                                 break;
> // ...
>         return NULL;
> }
> {code}
> That is, with just that WARNING (ERROR), the listening thread dies.
> There is no cleanup when the thread ends, so it keeps listening, but no one is accept()ing any connections. Because the OS takes care of the TCP handshake, it appears as though Asterisk has hung before it can do a write. (You can connect to the port, but nothing happens.) But the problem is even earlier.
> This function is used here:
> {noformat}
> asterisk-rw-13.git$ wgrep . accept_fn.*ast_tcptls_server_root
> ./main/manager.c:	.accept_fn = ast_tcptls_server_root,	/* thread doing the accept() */
> ./main/manager.c:	.accept_fn = ast_tcptls_server_root,	/* thread doing the accept() */
> ./main/http.c:	.accept_fn = ast_tcptls_server_root,
> ./main/http.c:	.accept_fn = ast_tcptls_server_root,
> ./channels/chan_sip.c:	.accept_fn = ast_tcptls_server_root,
> ./channels/chan_sip.c:	.accept_fn = ast_tcptls_server_root,
> {noformat}
> (once for TCP, once for SSL)
> In a 2014 commit that has made it to Asterisk 13+, the message has been changed from WARNING to ERROR.
> {noformat}
> commit 7c276f9fef945b644566533ddbcb72a2ec8ff821
> Author: Olle Johansson <oej at edvina.net>
> Date:   Sun Apr 27 19:29:27 2014 +0000
>     tcptls.c : Log errors as ERROR, not warning or something else.
> {noformat}
> But there still is no indication that the thread has ended.
> Suggestions for improvement:
> - add another ERROR before {{return NULL}} that says the thread has ended (prematurely)
> - or, don't end the thread just because a single accept() failed and stay in the for-loop instead; that would make Asterisk more resilient against temporary problems



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list