[Asterisk-Dev] Help Debugging Dropped Call Audio - Possibly Fixed

Matt Roth mroth at imminc.com
Sat Dec 24 14:53:01 MST 2005


I posted this yesterday, but it looks like it got dropped somewhere 
along the way.  If it didn't, my apologies for the repost.

==================================================

Kevin P. Fleming wrote:

> Matt Roth wrote:
>
>> Thank you for the information.  It seems that ast_rtp_bridge() is not 
>> being used to bridge the channels, but that did get me on the correct 
>> trail.
>
>
> It should be; if the peers are both SIP/RTP, then ast_rtp_bridge() 
> should be called unless you are doing something to stop native 
> bridging from occurring (recording, DTMF features, etc). From what I 
> understand from previous conversations, you have these audio problems 
> even when a simple call with no extra features is being made.

These particular tests were being recorded via Monitor().  In the
future, I'll have to look into how calls that aren't being recorded are
handled.

>> I don't believe this swap properly prioritizes the channels.  
>> Whichever order they are in, it is ultimately random which one 
>> ast_waitfor_n() will return back to be read from.  This means that 
>> the same channel can be read multiple times in a row, starving the 
>> other one.  That causes dropped audio on the call, in one direction.  
>> It simultaneously unbalances the chan->insmpl and chan->outsmpl 
>> member variables and causes ast_read() and ast_write() to perform an 
>> ast_seekstream() on the files if the call is being recorded.  These 
>> ast_seekstream() calls jump the file pointer forward, leaving a gap 
>> in the data which comes across as a pop in the PCM format.
>
>
> But what you are missing is that frames only arrive 50 times per 
> second. Even if the loop _always_ found a frame from channel 1 first, 
> it would immediately loop around again and find the frame from channel 
> 2 without blocking on channel 1. Assuming it does not take 20ms to 
> actually process the frame from channel 1 out to channel 2, there is 
> plenty of time to handle both channels during each 20ms cycle 
> regardless of the priority they are handed to poll().
>
> There is no possibility of 'starving' a channel, unless one channel is 
> feeding frames to Asterisk so fast that it can't do anything but write 
> them out and loop around again.

I reverted the code back to the CVS version, except for some debugging
code and a couple of ast_log() statements.  I kept the ast_log()
statements in ast_read() and ast_write() to identify when
ast_seekstream() is being called on the leg files.  I added some
debugging code to ast_generic_bridge() that keeps track of the number of
times each channel wins, and an ast_log() statement that reports those
values every time the same channel wins twice in a row.  Here is an excerpt:

static enum ast_bridge_result ast_generic_bridge(struct ast_channel *c0,
struct ast_channel *c1, struct ast_bridge_config *config, struct
ast_frame **fo, struct ast_channel **rc, int toms)
{
    /* Copy voice back and forth between the two channels. */
    struct ast_channel *cs[3];
    struct ast_frame *f;
    struct ast_channel *who = NULL;
    enum ast_bridge_result res = AST_BRIDGE_COMPLETE;
    int o0nativeformats;
    int o1nativeformats;
    int watch_c0_dtmf;
    int watch_c1_dtmf;
    void *pvt0, *pvt1;

    /* Debug variables */
    struct ast_channel *lastWho = NULL;
    int c0won = 0;
    int c1won = 0;

    cs[0] = c0;
    cs[1] = c1;
    pvt0 = c0->tech_pvt;
    pvt1 = c1->tech_pvt;
    o0nativeformats = c0->nativeformats;
    o1nativeformats = c1->nativeformats;
    watch_c0_dtmf = config->flags & AST_BRIDGE_DTMF_CHANNEL_0;
    watch_c1_dtmf = config->flags & AST_BRIDGE_DTMF_CHANNEL_1;

    for (;;) {
        if ((c0->tech_pvt != pvt0) || (c1->tech_pvt != pvt1) ||
            (o0nativeformats != c0->nativeformats) ||
            (o1nativeformats != c1->nativeformats)) {

            /* Check for Masquerade, codec changes, etc */
            res = AST_BRIDGE_RETRY;
            break;
        }

        who = ast_waitfor_n(cs, 2, &toms);

       /* Keep track of the number of wins for each channel */
        if (who == c0)
            c0won++;
        else
            c1won++;

       /* Log when a channel wins twice in a row */
        if (who == lastWho) {
            ast_log(LOG_NOTICE, "%s = %d wins   %s = %d wins\n", 
c0->name, c0won, c1->name, c1won);
        }

       /* Keep track of the last winner */
       lastWho = who;

       /* Rest of function unchanged */
}

A test call produced the following output:

Dec 23 13:32:34 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 0 wins   SIP/116666-f4bc = 2 wins
Dec 23 13:32:34 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 0 wins   SIP/116666-f4bc = 3 wins
Dec 23 13:32:34 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:35:19 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8260 wins   SIP/116666-f4bc = 8264 wins
Dec 23 13:35:19 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:35:19 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8262 wins   SIP/116666-f4bc = 8264 wins
Dec 23 13:35:25 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8560 wins   SIP/116666-f4bc = 8564 wins
Dec 23 13:35:25 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8560 wins   SIP/116666-f4bc = 8565 wins
Dec 23 13:35:25 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:35:25 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8562 wins   SIP/116666-f4bc = 8565 wins
Dec 23 13:35:28 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8716 wins   SIP/116666-f4bc = 8721 wins
Dec 23 13:35:28 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8716 wins   SIP/116666-f4bc = 8722 wins
Dec 23 13:35:28 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:35:28 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8718 wins   SIP/116666-f4bc = 8722 wins
Dec 23 13:35:32 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8926 wins   SIP/116666-f4bc = 8932 wins
Dec 23 13:35:32 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 8928 wins   SIP/116666-f4bc = 8932 wins
Dec 23 13:35:36 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9135 wins   SIP/116666-f4bc = 9141 wins
Dec 23 13:35:36 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9135 wins   SIP/116666-f4bc = 9142 wins
Dec 23 13:35:36 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:35:36 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9137 wins   SIP/116666-f4bc = 9142 wins
Dec 23 13:35:36 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9138 wins   SIP/116666-f4bc = 9142 wins
Dec 23 13:35:36 WARNING[19205]: channel.c:2262 ast_write: Performed seek
in monitoring write stream to synchronize the leg files
Dec 23 13:35:41 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9345 wins   SIP/116666-f4bc = 9351 wins
Dec 23 13:35:41 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9345 wins   SIP/116666-f4bc = 9352 wins
Dec 23 13:35:41 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:35:41 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9347 wins   SIP/116666-f4bc = 9352 wins
Dec 23 13:35:45 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9554 wins   SIP/116666-f4bc = 9561 wins
Dec 23 13:35:45 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9554 wins   SIP/116666-f4bc = 9562 wins
Dec 23 13:35:45 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:35:45 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9556 wins   SIP/116666-f4bc = 9562 wins
Dec 23 13:35:45 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9557 wins   SIP/116666-f4bc = 9562 wins
Dec 23 13:35:45 WARNING[19205]: channel.c:2262 ast_write: Performed seek
in monitoring write stream to synchronize the leg files
Dec 23 13:35:49 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9765 wins   SIP/116666-f4bc = 9772 wins
Dec 23 13:35:49 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9767 wins   SIP/116666-f4bc = 9772 wins
Dec 23 13:35:53 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9975 wins   SIP/116666-f4bc = 9982 wins
Dec 23 13:35:53 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 9977 wins   SIP/116666-f4bc = 9982 wins
Dec 23 13:35:57 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 10185 wins   SIP/116666-f4bc = 10192 wins
Dec 23 13:35:57 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 10187 wins   SIP/116666-f4bc = 10192 wins
Dec 23 13:36:02 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 10395 wins   SIP/116666-f4bc = 10402 wins
Dec 23 13:36:06 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 10604 wins   SIP/116666-f4bc = 10612 wins
Dec 23 13:36:06 WARNING[19205]: channel.c:1914 ast_read: Performed seek
in monitoring read stream to synchronize the leg files
Dec 23 13:36:06 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 10606 wins   SIP/116666-f4bc = 10612 wins
Dec 23 13:39:22 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 20431 wins   SIP/116666-f4bc = 20439 wins
Dec 23 13:39:22 NOTICE[19205]: channel.c:3248 ast_generic_bridge:
SIP/116667-88eb = 20433 wins   SIP/116666-f4bc = 20439 wins

Note that while each "NOTICE" from ast_generic_bridge() is not
necessarily accompanied by a "WARNING" from ast_read() or ast_write(),
each "WARNING" does have a corresponding "NOTICE" that happens at the
same second.  Each "NOTICE"/"WARNING" pair is also accompanied by an
audible pop in the PCM format recording of the call.  It's also worthy
of note that the "NOTICE" immediately precedes the "WARNING."  This
indicates a possible causal relationship between the two.  This almost
always happens at the beginning of the call, presumably because the
caller's channel is ready for reading before the callees.  Consequently,
almost all of our recordings begin with a pop.

Maybe the most significant outcome of this testing is the proof that the
reads from each channel do, in fact, fall out of synchronization and
that this leads to a defect in the recording.  I understand that the
cause of this may be something in our configuration, but so far the only
fix that I've found (and I've tried a great number) is the change that I
documented in my last post.

That change passed only the first element of the array of channels to
ast_waitfor_n(), guaranteeing that it would win the race to be read.  In
conjunction with the swapping of the channels in the array on each pass
of the bridging loop, this guarantees that each channel gets read an
equal number of times.  I tested this by leaving the additional logging
in place, making the change, and placing a test call.  The call produced
no "WARNING" or "NOTICE" messages and the recording had no pops.

>> Now the swapping of the channels guarantees that each gets its fair 
>> share of reads.  This should be okay, since Asterisk already relies 
>> on silence transmission.  I tested this with a number of calls and 
>> the results were good.  I understand that it may be an overly simple 
>> fix with possible side effects, which is why I'm submitting it for 
>> review.
>
>
> I don't understand how you think this could have that effect... there 
> are two possibilities:
>
> 1) the list of fds provided to poll() is always processed by the 
> kernel in a random order, in which case all this priority-swapping 
> makes no difference whatsoever

If this were the case, the following code in ast_generic_bridge would be
meaningless and the channel being read would be random:

        /* Swap who gets priority */
        cs[2] = cs[0];
        cs[0] = cs[1];
        cs[1] = cs[2];

> 2) the list of fds provided to poll() is processed in the supplied 
> order (or in the reverse order), in which case swapping them around 
> will guarantee each channel gets equal priority, since it won't matter 
> that the second/third/etc. channel in the list are even present at all

If that were the case, the ability to pass an array of channels instead
of a pointer to a single channel, is superfluous.

ast_waitfor_n() is just a wrapper around ast_waitfor_nandfds().  It
passes it the channel related variables, nulls or zeros out the file
descriptor related variables, and returns the result.  I am not clear on
the intricacies of ast_waitfor_nandfds(), but the name of the variable
that's returned is "winner."  This indicated to me that there is some
sort of race deciding which channel will get read, and that the order of
them in the array doesn't necessarily make a difference on the outcome.

Even if it's random which channel gets read, each has a 50/50 chance so
things would appear to work the majority of the time.  I believe this is
the behavior we're experiencing.  The fact that forcing the reads to
alternate seems to solve our problem lends credence to this theory.  Do
you know why that prioritization code exists in the first place when a
simple swap looks to be more effective?

> I understand that you are having a problem, but I just don't see how 
> it could be something so basic without affecting many (hundreds or 
> thousands) of other users. There has to be _something_ unique about 
> your configuration that is causing this behavior to be exhibited, 
> regardless of what turns out to be the cause. Until we can identify 
> what that source is, we can't be absolutely sure that your fix (or any 
> other fix) is the right change to make. 

I agree and I appreciate any time that I get from a person as high up
the ladder as yourself.  Please understand that the problem is sometimes
difficult to perceive on the call itself (the dropped audio is sometimes
of short duration, and I believe it's only in one direction and could
occur during a period of silence) and that the format that exposes it in
the digital recordings (PCM) is not widely used due to its lack of
compression.  I'd be interested to know if adding the log statements to
code on one of your boxes would cause it to produce the same messages.
We have tried swapping out just about every piece of the equation with
no change in the results.  The only fix we've found is the change to
ast_generic_bridge().  As far as that is concerned, if it works for us,
do you see any harm in it?  If there are no obvious nasty side effects,
it makes a good band-aid until we can resolve any problems in our
configuration.

Happy Holidays,

Matthew Roth
InterMedia Marketing Solutions
Software Engineer and Systems Developer



More information about the asterisk-dev mailing list