[asterisk-bugs] [JIRA] (ASTERISK-19277) endlessly repeating error: "poll failed: Bad file descriptor"

Walter Doekes (JIRA) noreply at issues.asterisk.org
Tue Jun 30 06:09:33 CDT 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-19277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226704#comment-226704 ] 

Walter Doekes edited comment on ASTERISK-19277 at 6/30/15 6:09 AM:
-------------------------------------------------------------------

Okay, so this apparently still exists (in Asterisk GIT-11-96bbcf4M).

I've found a way to reproduce it too:
- ulimit -n 128  # to speed it up
- asterisk -c
- start a bunch of calls so we run out of FDs
- press a key in the console
- result: a flood of CLI> prompts, and the console is unusable

At that point we arrive here (this is not a patch, so you shall not remove it -- and if it we're, I hereby grant you permission to use it):
{noformat}
--- a/main/asterisk.c
+++ b/main/asterisk.c
@@ -2530,8 +2530,21 @@ static int ast_el_read_char(EditLine *editline, char *cp)
                        fprintf(stderr, "poll failed: %s\n", strerror(errno));
                        break;
                }
+               
+               /* revents can contains POLLIN, but also POLLERR, POLLHUP and POLLNVAL.
+                * If the process is out of file descriptors, we get POLLNVAL. At this
+                * point, the console is busted. */
+               if ((fds[0].revents & POLLNVAL) == POLLNVAL ||
+                               (max > 1 && (fds[1].revents & POLLNVAL) == POLLNVAL)) {
+                       fprintf(stderr, "Poll returned POLLNVAL, are you out of file descriptors?\n");
+                       fflush(stderr);
+                       usleep(500000);
+                       return 0;
+               }
 
-               if (!ast_opt_exec && fds[1].revents) {
+               if (max > 1 && fds[1].revents) {
                        num_read = read(STDIN_FILENO, cp, 1);
                        if (num_read < 1) {
                                break;
{noformat}

Unfortunetely, that isn't enough. At this point the console is busted.

So, it would seem that STDIN_FILENO is close()d by someone. Except, my GDB cannot find out who.

I tried this:

{noformat}
$ sudo gdb `which asterisk`
...
(gdb) run -c
...
...........................................................................................................................................................Asterisk Ready.
[New Thread 0x7fffa70f4700 (LWP 10377)]
*CLI> 
{noformat}

At this point, the FDs look like this:

{noformat}
# ls /proc/`pidof asterisk`/fd -lv
total 0
lrwx------ 1 root root 64 jun 30 12:18 0 -> /dev/pts/37
lrwx------ 1 root root 64 jun 30 12:18 1 -> /dev/pts/37
lrwx------ 1 root root 64 jun 30 12:18 2 -> /dev/pts/37
lr-x------ 1 root root 64 jun 30 12:18 3 -> /dev/urandom
lrwx------ 1 root root 64 jun 30 12:18 4 -> socket:[1511689]
lrwx------ 1 root root 64 jun 30 12:18 5 -> /var/lib/asterisk/astdb.sqlite3
...
{noformat}

Add some breakpoints:
{noformat}
*CLI> 
Program received signal SIGINT, Interrupt.
0x00007ffff78ff12d in poll () at ../sysdeps/unix/syscall-template.S:81
81	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) break close
Breakpoint 1 at 0x7ffff67a63f0: close. (2 locations)
(gdb) cond 1 $rbx == 0

(gdb) break dup2
Breakpoint 2 at 0x7ffff78fdfe0: file ../sysdeps/unix/syscall-template.S, line 81.
(gdb) cond 2 $rcx == 0

(gdb) break dup3
Breakpoint 3 at 0x7ffff78fe010: file ../sysdeps/unix/syscall-template.S, line 81.
(gdb) cond 3 $rcx == 0

(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   <MULTIPLE>         
	stop only if $rbx == 0
1.1                         y     0x00007ffff67a63f0 ../sysdeps/unix/syscall-template.S:81
1.2                         y     0x00007ffff78fdf50 ../sysdeps/unix/syscall-template.S:81
3       breakpoint     keep y   0x00007ffff78fdfe0 ../sysdeps/unix/syscall-template.S:81
	stop only if $rcx == 0
3       breakpoint     keep y   0x00007ffff78fe010 ../sysdeps/unix/syscall-template.S:81
	stop only if $rcx == 0
{noformat}

Carry on and start flooding the thing:
{noformat}
(gdb) cont
Continuing.
...
[New Thread 0x7fffa6ffc700 (LWP 10529)]
Poll returned POLLNVAL, are you out of file descriptors?
[Thread 0x7fffa6ffc700 (LWP 10529) exited]

Breakpoint 1, close () at ../sysdeps/unix/syscall-template.S:81
81	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  close () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000042c7f7 in ast_ouraddrfor (them=them at entry=0x17b78d8, us=us at entry=0x17b7ba8) at acl.c:972
#2  0x00007fffb9907c5c in ast_sip_ouraddrfor (them=0x17b78d8, us=us at entry=0x17b7ba8, p=0x17b3178) at chan_sip.c:3801
#3  0x00007fffb9934f8f in sip_alloc (callid=callid at entry=0x179b0b2 "163-10403 at 127.0.0.1", addr=addr at entry=0x7fffa73dbd20, useglobal_nat=useglobal_nat at entry=1, intended_method=intended_method at entry=5, 
    req=req at entry=0x7fffa73db2b0, logger_callid=logger_callid at entry=0x1513df0) at chan_sip.c:8584
...
(gdb) up
#1  0x000000000042c7f7 in ast_ouraddrfor (them=them at entry=0x17b78d8, us=us at entry=0x17b7ba8) at acl.c:972
972		close(s);
(gdb) print s
$2 = 0
(gdb) list 960
955		if ((s = socket(ast_sockaddr_is_ipv6(them) ? AF_INET6 : AF_INET,
956				SOCK_DGRAM, 0)) < 0) {
957			ast_log(LOG_ERROR, "Cannot create socket\n");
958			return -1;
...
{noformat}

And indeed, fd 0 has been replaced. It has been (silently) closed by someone/something, and then a socket() call got hold of it.

{noformat}
# ls /proc/`pidof asterisk`/fd -lv
total 0
lrwx------ 1 root root 64 jun 30 12:18 0 -> socket:[1513038]
lrwx------ 1 root root 64 jun 30 12:18 1 -> /dev/pts/37
lrwx------ 1 root root 64 jun 30 12:18 2 -> /dev/pts/37
lr-x------ 1 root root 64 jun 30 12:18 3 -> /dev/urandom
lrwx------ 1 root root 64 jun 30 12:18 4 -> socket:[1511689]
lrwx------ 1 root root 64 jun 30 12:18 5 -> /var/lib/asterisk/astdb.sqlite3
...
{noformat}

Weird no? I would have expected a breakpoint on either close, dup2 or dup3 the first time around. I cannot figure out how this happens.


was (Author: wdoekes):
Okay, so this apparently still exists (in Asterisk GIT-11-96bbcf4M).

I've found a way to reproduce it too:
- ulimit -n 128  # to speed it up
- asterisk -c
- start a bunch of calls so we run out of FDs

At that point we arrive here (this is not a patch, so you shall not remove it -- and if it we're, I hereby grant you permission to use it):
{noformat}
--- a/main/asterisk.c
+++ b/main/asterisk.c
@@ -2530,8 +2530,21 @@ static int ast_el_read_char(EditLine *editline, char *cp)
                        fprintf(stderr, "poll failed: %s\n", strerror(errno));
                        break;
                }
+               
+               /* revents can contains POLLIN, but also POLLERR, POLLHUP and POLLNVAL.
+                * If the process is out of file descriptors, we get POLLNVAL. At this
+                * point, the console is busted. */
+               if ((fds[0].revents & POLLNVAL) == POLLNVAL ||
+                               (max > 1 && (fds[1].revents & POLLNVAL) == POLLNVAL)) {
+                       fprintf(stderr, "Poll returned POLLNVAL, are you out of file descriptors?\n");
+                       fflush(stderr);
+                       usleep(500000);
+                       return 0;
+               }
 
-               if (!ast_opt_exec && fds[1].revents) {
+               if (max > 1 && fds[1].revents) {
                        num_read = read(STDIN_FILENO, cp, 1);
                        if (num_read < 1) {
                                break;
{noformat}

Unfortunetely, that isn't enough. At this point the console is busted.

So, it would seem that STDIN_FILENO is close()d by someone. Except, my GDB cannot find out who.

I tried this:

{noformat}
$ sudo gdb `which asterisk`
...
(gdb) run -c
...
...........................................................................................................................................................Asterisk Ready.
[New Thread 0x7fffa70f4700 (LWP 10377)]
*CLI> 
{noformat}

At this point, the FDs look like this:

{noformat}
# ls /proc/`pidof asterisk`/fd -lv
total 0
lrwx------ 1 root root 64 jun 30 12:18 0 -> /dev/pts/37
lrwx------ 1 root root 64 jun 30 12:18 1 -> /dev/pts/37
lrwx------ 1 root root 64 jun 30 12:18 2 -> /dev/pts/37
lr-x------ 1 root root 64 jun 30 12:18 3 -> /dev/urandom
lrwx------ 1 root root 64 jun 30 12:18 4 -> socket:[1511689]
lrwx------ 1 root root 64 jun 30 12:18 5 -> /var/lib/asterisk/astdb.sqlite3
...
{noformat}

Add some breakpoints:
{noformat}
*CLI> 
Program received signal SIGINT, Interrupt.
0x00007ffff78ff12d in poll () at ../sysdeps/unix/syscall-template.S:81
81	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) break close
Breakpoint 1 at 0x7ffff67a63f0: close. (2 locations)
(gdb) cond 1 $rbx == 0

(gdb) break dup2
Breakpoint 2 at 0x7ffff78fdfe0: file ../sysdeps/unix/syscall-template.S, line 81.
(gdb) cond 2 $rcx == 0

(gdb) break dup3
Breakpoint 3 at 0x7ffff78fe010: file ../sysdeps/unix/syscall-template.S, line 81.
(gdb) cond 3 $rcx == 0

(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   <MULTIPLE>         
	stop only if $rbx == 0
1.1                         y     0x00007ffff67a63f0 ../sysdeps/unix/syscall-template.S:81
1.2                         y     0x00007ffff78fdf50 ../sysdeps/unix/syscall-template.S:81
3       breakpoint     keep y   0x00007ffff78fdfe0 ../sysdeps/unix/syscall-template.S:81
	stop only if $rcx == 0
3       breakpoint     keep y   0x00007ffff78fe010 ../sysdeps/unix/syscall-template.S:81
	stop only if $rcx == 0
{noformat}

Carry on and start flooding the thing:
{noformat}
(gdb) cont
Continuing.
...
[New Thread 0x7fffa6ffc700 (LWP 10529)]
Poll returned POLLNVAL, are you out of file descriptors?
[Thread 0x7fffa6ffc700 (LWP 10529) exited]

Breakpoint 1, close () at ../sysdeps/unix/syscall-template.S:81
81	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  close () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000042c7f7 in ast_ouraddrfor (them=them at entry=0x17b78d8, us=us at entry=0x17b7ba8) at acl.c:972
#2  0x00007fffb9907c5c in ast_sip_ouraddrfor (them=0x17b78d8, us=us at entry=0x17b7ba8, p=0x17b3178) at chan_sip.c:3801
#3  0x00007fffb9934f8f in sip_alloc (callid=callid at entry=0x179b0b2 "163-10403 at 127.0.0.1", addr=addr at entry=0x7fffa73dbd20, useglobal_nat=useglobal_nat at entry=1, intended_method=intended_method at entry=5, 
    req=req at entry=0x7fffa73db2b0, logger_callid=logger_callid at entry=0x1513df0) at chan_sip.c:8584
...
(gdb) up
#1  0x000000000042c7f7 in ast_ouraddrfor (them=them at entry=0x17b78d8, us=us at entry=0x17b7ba8) at acl.c:972
972		close(s);
(gdb) print s
$2 = 0
(gdb) list 960
955		if ((s = socket(ast_sockaddr_is_ipv6(them) ? AF_INET6 : AF_INET,
956				SOCK_DGRAM, 0)) < 0) {
957			ast_log(LOG_ERROR, "Cannot create socket\n");
958			return -1;
...
{noformat}

And indeed, fd 0 has been replaced. It has been (silently) closed by someone/something, and then a socket() call got hold of it.

{noformat}
# ls /proc/`pidof asterisk`/fd -lv
total 0
lrwx------ 1 root root 64 jun 30 12:18 0 -> socket:[1513038]
lrwx------ 1 root root 64 jun 30 12:18 1 -> /dev/pts/37
lrwx------ 1 root root 64 jun 30 12:18 2 -> /dev/pts/37
lr-x------ 1 root root 64 jun 30 12:18 3 -> /dev/urandom
lrwx------ 1 root root 64 jun 30 12:18 4 -> socket:[1511689]
lrwx------ 1 root root 64 jun 30 12:18 5 -> /var/lib/asterisk/astdb.sqlite3
...
{noformat}

Weird no? I would have expect a breakpoint on either close, dup2 or dup3 the first time around. I cannot figure out how this happens.

> endlessly repeating error: "poll failed: Bad file descriptor"
> -------------------------------------------------------------
>
>                 Key: ASTERISK-19277
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-19277
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/SQLite3
>    Affects Versions: 1.8.9.0
>         Environment: OS X 10.5.8 on PPC Mac (G5)
>            Reporter: Barry Chern
>            Assignee: Barry Chern
>            Severity: Minor
>         Attachments: onePointEightIssueLog
>
>
> Attempting to move to 1.8 from a long-time working, not very complex, 1.6.x installation. Started with a download of the last release (1.8.8), then tried latest builds via SVN twice. Latest attempt with Asterisk SVN-branch-1.8-r353175.
> After apparently executing all of the steps of a dialplan without a problem, and hanging up, there is one error that seems to be incorrect, saying the csv couldn't be written to a readonly database, but the csv data HAS been written when I check on it. there follows a fast stream of these errors (after a slight pause) ad infinitum until I either cancel or the process kills itself by crashing:
> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> *CLI> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> [Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> *[Jan 30 04:32:14] ERROR[13942]: asterisk.c:2218 ast_el_read_char: poll failed: Bad file descriptor
> etc.etc. etc. (edited for brevity)



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list