[Asterisk-Dev] Help Debugging Dropped Call Audio - Possibly Fixed

Fri Dec 23 14:38:29 MST 2005

Matt Roth wrote:

> These particular tests were being recorded via Monitor().  In the 
> future, I'll have to look into how calls that aren't being recorded are 
> handled.

Monitor() is likely the source of the problem them; it is known to cause 
audio path inconsistencies because it does all the writes to the 
filesystem synchronously, and if the filesystem does not respond quickly 
enough it will cause the audio path to be disrupted.

> Maybe the most significant outcome of this testing is the proof that the 
> reads from each channel do, in fact, fall out of synchronization and 
> that this leads to a defect in the recording.  I understand that the 
> cause of this may be something in our configuration, but so far the only 
> fix that I've found (and I've tried a great number) is the change that I 
> documented in my last post.

Understood.

> That change passed only the first element of the array of channels to 
> ast_waitfor_n(), guaranteeing that it would win the race to be read.  In 
> conjunction with the swapping of the channels in the array on each pass 
> of the bridging loop, this guarantees that each channel gets read an 
> equal number of times.  I tested this by leaving the additional logging 
> in place, making the change, and placing a test call.  The call produced 
> no "WARNING" or "NOTICE" messages and the recording had no pops.

But it doesn't actually guarantee that at all... you have to keep in 
mind that the packet delivery from the channels is not consistent 
(jitter and packet loss come into play) and its very possible in this 
configuration to lose packets from a channel because you have decided to 
wait for a packet from the other one and the 'current' packet from that 
channel is late and/or lost. In that case, you will read from this 
channel, then read from the other channel, then go back to this one, 
even though the other channel still has a pending packet to be read.

This is why it is necessary to block on _all_ the channels every time 
through the loop, so that a channel that does not have a packet 
available does not hold up the one that does.

> If this were the case, the following code in ast_generic_bridge would be 
> meaningless and the channel being read would be random:
> 
>        /* Swap who gets priority */
>        cs[2] = cs[0];
>        cs[0] = cs[1];
>        cs[1] = cs[2];

That is correct. This code exists because the 'random poll()' 
implementation may or may not exist on any given platform. Note that the 
manpage for poll() (at least on Linux) make no comment about the 
priority or lack thereof of the file descriptor array.

> If that were the case, the ability to pass an array of channels instead 
> of a pointer to a single channel, is superfluous.

No, it's not (see above). In the case where there is _always_ data 
available, you are correct. However, in the vast majority of cases, the 
thread will block waiting for one of the channels to have data 
available, in which case it is vital to be blocked on both of them 
simultaneously.

> ast_waitfor_n() is just a wrapper around ast_waitfor_nandfds().  It 
> passes it the channel related variables, nulls or zeros out the file 
> descriptor related variables, and returns the result.  I am not clear on 
> the intricacies of ast_waitfor_nandfds(), but the name of the variable 
> that's returned is "winner."  This indicated to me that there is some 
> sort of race deciding which channel will get read, and that the order of 
> them in the array doesn't necessarily make a difference on the outcome.

That is absolutely correct. The only reason that 'priority swapping' 
code exists at all is because _some_ poll() implementations do give 
priority to the file descriptions in the order they are placed in the 
array. For implementations that don't, it makes no difference.

> ast_generic_bridge().  As far as that is concerned, if it works for us, 
> do you see any harm in it?  If there are no obvious nasty side effects, 
> it makes a good band-aid until we can resolve any problems in our 
> configuration.

I certainly won't begrudge you a solution that works for you, but I'd 
still like us all to be able to understand exactly what is the cause of 
this problem.

It would be helpful to know:

1) Does the problem happen when recording is not in place?
2) Does the problem happen when there is no transcoding going on 
(between the phones or to the recording file)?
3) Does the problem happen if the phones are allowed to talk directly to 
each other (SIP media path re-INVITE)?
4) What sort of disk subsystem and filesystem are being used to capture 
the recording files?