[Asterisk-Dev] Help Debugging Dropped Call Audio - Possibly Fixed
Kevin P. Fleming
kpfleming at digium.com
Fri Dec 23 14:38:29 MST 2005
Matt Roth wrote:
> These particular tests were being recorded via Monitor(). In the
> future, I'll have to look into how calls that aren't being recorded are
> handled.
Monitor() is likely the source of the problem them; it is known to cause
audio path inconsistencies because it does all the writes to the
filesystem synchronously, and if the filesystem does not respond quickly
enough it will cause the audio path to be disrupted.
> Maybe the most significant outcome of this testing is the proof that the
> reads from each channel do, in fact, fall out of synchronization and
> that this leads to a defect in the recording. I understand that the
> cause of this may be something in our configuration, but so far the only
> fix that I've found (and I've tried a great number) is the change that I
> documented in my last post.
Understood.
> That change passed only the first element of the array of channels to
> ast_waitfor_n(), guaranteeing that it would win the race to be read. In
> conjunction with the swapping of the channels in the array on each pass
> of the bridging loop, this guarantees that each channel gets read an
> equal number of times. I tested this by leaving the additional logging
> in place, making the change, and placing a test call. The call produced
> no "WARNING" or "NOTICE" messages and the recording had no pops.
But it doesn't actually guarantee that at all... you have to keep in
mind that the packet delivery from the channels is not consistent
(jitter and packet loss come into play) and its very possible in this
configuration to lose packets from a channel because you have decided to
wait for a packet from the other one and the 'current' packet from that
channel is late and/or lost. In that case, you will read from this
channel, then read from the other channel, then go back to this one,
even though the other channel still has a pending packet to be read.
This is why it is necessary to block on _all_ the channels every time
through the loop, so that a channel that does not have a packet
available does not hold up the one that does.
> If this were the case, the following code in ast_generic_bridge would be
> meaningless and the channel being read would be random:
>
> /* Swap who gets priority */
> cs[2] = cs[0];
> cs[0] = cs[1];
> cs[1] = cs[2];
That is correct. This code exists because the 'random poll()'
implementation may or may not exist on any given platform. Note that the
manpage for poll() (at least on Linux) make no comment about the
priority or lack thereof of the file descriptor array.
> If that were the case, the ability to pass an array of channels instead
> of a pointer to a single channel, is superfluous.
No, it's not (see above). In the case where there is _always_ data
available, you are correct. However, in the vast majority of cases, the
thread will block waiting for one of the channels to have data
available, in which case it is vital to be blocked on both of them
simultaneously.
> ast_waitfor_n() is just a wrapper around ast_waitfor_nandfds(). It
> passes it the channel related variables, nulls or zeros out the file
> descriptor related variables, and returns the result. I am not clear on
> the intricacies of ast_waitfor_nandfds(), but the name of the variable
> that's returned is "winner." This indicated to me that there is some
> sort of race deciding which channel will get read, and that the order of
> them in the array doesn't necessarily make a difference on the outcome.
That is absolutely correct. The only reason that 'priority swapping'
code exists at all is because _some_ poll() implementations do give
priority to the file descriptions in the order they are placed in the
array. For implementations that don't, it makes no difference.
> ast_generic_bridge(). As far as that is concerned, if it works for us,
> do you see any harm in it? If there are no obvious nasty side effects,
> it makes a good band-aid until we can resolve any problems in our
> configuration.
I certainly won't begrudge you a solution that works for you, but I'd
still like us all to be able to understand exactly what is the cause of
this problem.
It would be helpful to know:
1) Does the problem happen when recording is not in place?
2) Does the problem happen when there is no transcoding going on
(between the phones or to the recording file)?
3) Does the problem happen if the phones are allowed to talk directly to
each other (SIP media path re-INVITE)?
4) What sort of disk subsystem and filesystem are being used to capture
the recording files?
More information about the asterisk-dev
mailing list