[asterisk-dev] [Code Review] Don't miss control frames if a call is answered and hung up very quickly

Wed Oct 13 02:07:56 CDT 2010

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/740/
-----------------------------------------------------------

(Updated 2010-10-13 02:07:56.629037)

Review request for Asterisk Developers, Russell Bryant and David Vossel.

Changes
-------

Ok, so here is another go at fixing this, but first an overview.

We want to make sure if there are frames to be read when _softhangup is set, that those frames get read because they may be important (like an ANSWER frame so that CDRs show up correctly). We also *have* to make sure that ast_read() eventually returns NULL because ast_read is often called from code that looks like:

while (ast_waitfor(chan, -1) > -1) {
    f = ast_read(chan);
    if (!f)
        break;
    /* do stuff */
}

We can't just remove the ast_check_hangup() from ast_read() and continue reading frames, because code that calls ast_soft_hangup() or sets chan->_softhangup directly means that we may never return NULL from __ast_read().

The previous attempt at solving this by adding a new AST_SOFTHANGUP_END_READ flag and checking for it later doesn't work because if _softhangup was set w/o an AST_CONTROL_HANGUP eventually being sent, we ended up not getting to the code that set AST_SOFTHANGUP_END_READ. Another problem with it was the addition of the ast_check_hangup() check in __ast_queue_frame. This would prevent frames from being written which would wake up ast_waitfor() so we'd just wait around forever in some circumstances.

An easy way to create the situation where we get a softhangup before an AST_CONTROL_HANGUP is with the cli "soft hangup" command. With the last try at a fix, channels wouldn't hang up with the cli command. 

This attempt, instead of using a new flag for softhangup, instead appends a new control frame AST_CONTROL_END_READ to the queue in the case that ast_read() detects that _softhangup has been set (and then continuing to read queued frames). We should eventually read this frame and return NULL (unless there is already an AST_CONTROL_HANGUP in which case ast_read will return NULL anyway).

There are places in the code that directly set _softhangup w/o queuing a frame to ensure that any waitfor loops exit (like ast_softhangup() does). If these instances are not safe, then the existing code would be broken regardless of this change. Since much of that code is very, very old and no bugs have been opened about channels not hanging up (I hope) we can pretty safely assume that the code eventually queues another frame somewhere. In any case, since the change we are making is confined to ast_read(), it can't have any negative effect on that situation since ast_read() most likely wouldn't be called if there was a problem because ast_waitfor wouldn't return. If ast_read() is called by some fluke of some other frame being queued we will still see it in ast_read and queue the END_READ control frame so that we will eventually exit.

Tests done:
1) sipp test which immediately answers and hangs up: CDR shows answered whereas before it showed no answer
2) Using Alec Davis' example of a loop of local channels eventually calling an Echo: Both "console hangup" and "soft hangup" from the CLI result in all channels being hung up.

Summary
-------

When an outgoing call is answered and hung up by the far end *very* quickly, we may not read any frames and therefor end up with a call that displays the wrong disposition/DIALSTATUS. The reason is because ast_queue_hangup() immediately sets the _softhangup flag on the channel and then queues the HANGUP control frame, but __ast_read refuses to read any frames if ast_check_hangup() indicates that a hangup request has been made (which it will if _softhangup is set). So, we end up losing control frames.

This change makes __ast_read continue to read frames even if a soft hangup has been requested. I believe this should be safe as we actually check farther down in __ast_read for HANGUP frames and return NULL just like we would have if we skipped the read via the goto done statement, and we don't call into any of the channel tech callbacks unless the readq is empty.

This addresses bug 16946.
    https://issues.asterisk.org/view.php?id=16946

Diffs (updated)
-----

  /branches/1.4/include/asterisk/frame.h 291108 
  /branches/1.4/main/channel.c 291108 

Diff: https://reviewboard.asterisk.org/r/740/diff

Testing
-------

I have verified via sipp that when placing an outbound call that is answered and immediately hung up that the call shows up as answered (where it doesn't w/o the patch). I have also run the testsuite and verified that all tests continue to pass.

Thanks,

Terry