[Asterisk-Dev] Help Debugging Dropped Call Audio - Possibly Fixed
Matthew Roth
mroth_imm at hotmail.com
Mon Dec 26 12:30:57 MST 2005
All,
First, I'd like to apologize if these posts are off topic. I posted the
original messages to the users list as well and didn't get much of a
response. I'm new to the dev list, so if I need nudged in one direction or
the other, please do so.
That said, the responses here have been extremely helpful. Thank you. I'm
going to attempt to summarize what I've learned in this post, so that I have
a solid foundation for progress when I return to the office tomorrow.
I believe my assertion that call audio is dropped at the same time as the
pops in the recordings is correct, but it is difficult to prove. I based it
on listening to live calls via ChanSpy, noting any drops in audio, then
listening to the recordings. I also listened to some recordings where pops
seemed to accompany a communication breakdown between the callers.
This seemed to be good evidence of a relationship between dropped audio and
pops in the recordings, but I'm realizing it may be more productive to work
on this problem from the other direction. I will now approach it as a
digital recording problem on calls with channels that are bridged via
ast_generic_bridge(). Once that is solved, I will investigate the audio
quality of the calls themselves. When that time comes, I will have to
determine the best call quality analysis method. It's possible that the
same problem that was causing pops in the recordings was affecting what I
heard via ChanSpy, but not the call audio itself. It's also possible that
it was simply a coincidence that the breakdowns in communication happened at
the same time as the pops. I think that is unlikely, but I'm trying to keep
an open mind.
Thanks to Kevin, I now understand the logic in ast_waitfor_nandfds() much
more clearly. The only thing I was right about was that my fix was an
overly simple solution. I checked out the man page for poll() and it is
undocumented if the order of the file descriptors in the array affects the
priority they are given. However, I did notice something interesting about
the return value from poll():
On success, a positive number is returned, where the number returned
is the number of structures which have non-zero revents fields (in
other words, those descriptors with events or errors reported).
It appears that poll() can return that one or more of the file descriptors
have events on them that are ready to be handled, but ast_waitfor_nandfds()
only returns a single "winner." Is ast_waitfor_nandfds() properly handling
the case of multiple file descriptors signaling? Forgive me if that is a
naive question. I've been working on our implementation for about half a
year on many diverse aspects, but I've only recently started to inspect the
code. It is well written and well documented, but some of it is still
difficult to comprehend without asking questions.
I know first hand that Monitor() introduces a significant I/O bottleneck
with its synchronous writes. We solved this a while back by writing the leg
files to a RAM disk as documented in these locations:
http://thread.gmane.org/gmane.comp.telephony.pbx.asterisk.user/118497
http://lists.digium.com/pipermail/asterisk-users/2005-October/127919.html
I'm curious to know if writing to some memory structure such as a queue has
been considered. A separate writer thread could then be responsible for
servicing this queue parallel to Asterisk's handling of the call itself. We
have a huge amount of memory to play with in our machine (20 GB), but some
sort of dynamic queue structure and a writer thread may help on lighter
servers.
Out of all of this information, one thing seems clear to me. The servicing
of the channels in ast_generic_bridge() and the Monitor() related code in
ast_read() and ast_write() don't play well with each other. Both ast_read()
and ast_write() use chan->insmpl and chan->outsmpl to determine if they need
to call ast_seekstream() to jump the file pointer ahead in one of the leg
files to keep them synchronized. It seems that the problem centers around
ast_seekstream() being called based on an assumption that the chan->insmpl
and chan->outsmpl variables are being incremented on a one-to-one basis,
while ast_generic_bridge() calls ast_read() and ast_write() on a schedule
that is nearly, but not quite, one-to-one. Every so often, this leads to
ast_seekstream() being called to fill in the gap for data it assumes is
missing, but will be received within the next couple of loops of
ast_generic_bridge().
How the gap in the audio file sounds is dependant on its format. We're
using PCM, so for us it's a pop. I know this problem seems minor, but for
us it is not. We are handling inbound calling for multiple clients and they
will want to hear these recordings on a regular basis. The pops may lead
them to question the quality of the calls themselves and could ultimately
lead to a loss in business for us. That is why I'm so interested in getting
rid of them, as they appear in nearly every call.
MixMonitor() seems like it may be better suited to solving this issue with
some sort of buffer, but until it can initiate recordings out of queues, I
don't think we can use it. It's possible that adding the option to
Monitor() that waits until the call is bridged to begin recording may help
minimize the issue. I will look into this tomorrow.
We're using Asterisk for business purposes, but have no intention of hiding
our implementation. Once we get all of these little gotchas worked out, I
believe we'll be the largest single-server installation (~500
agents/simultaneous calls with digital recording) and an excellent case to
cite when discussing Asterisk as an inbound call center solution. Once a
mature predictive dialer is available, we intend to use Asterisk for our
outbound calling as well. This is exciting, because our other options are
traditional PBXs such as NorTel, but I believe Asterisk can provide the same
service at a fraction of the cost.
I apologize if I'm a little overzealous in trying to address what are
relatively minor issues, but I feel as if I've been running a marathon at a
sprint for the last few months.
Thank you,
Matthew Roth
InterMedia Marketing Solutions
Software Engineer and Systems Developer
More information about the asterisk-dev
mailing list