[Asterisk-Dev] Help Debugging Dropped Call Audio - Possibly Fixed

Matthew Roth mroth_imm at hotmail.com
Mon Dec 26 12:30:57 MST 2005


All,

First, I'd like to apologize if these posts are off topic.  I posted the 
original messages to the users list as well and didn't get much of a 
response.  I'm new to the dev list, so if I need nudged in one direction or 
the other, please do so.

That said, the responses here have been extremely helpful.  Thank you.  I'm 
going to attempt to summarize what I've learned in this post, so that I have 
a solid foundation for progress when I return to the office tomorrow.

I believe my assertion that call audio is dropped at the same time as the 
pops in the recordings is correct, but it is difficult to prove.  I based it 
on listening to live calls via ChanSpy, noting any drops in audio, then 
listening to the recordings.  I also listened to some recordings where pops 
seemed to accompany a communication breakdown between the callers.

This seemed to be good evidence of a relationship between dropped audio and 
pops in the recordings, but I'm realizing it may be more productive to work 
on this problem from the other direction.  I will now approach it as a 
digital recording problem on calls with channels that are bridged via 
ast_generic_bridge().  Once that is solved, I will investigate the audio 
quality of the calls themselves.  When that time comes, I will have to 
determine the best call quality analysis method.  It's possible that the 
same problem that was causing pops in the recordings was affecting what I 
heard via ChanSpy, but not the call audio itself.  It's also possible that 
it was simply a coincidence that the breakdowns in communication happened at 
the same time as the pops.  I think that is unlikely, but I'm trying to keep 
an open mind.

Thanks to Kevin, I now understand the logic in ast_waitfor_nandfds() much 
more clearly.  The only thing I was right about was that my fix was an 
overly simple solution.  I checked out the man page for poll() and it is 
undocumented if the order of the file descriptors in the array affects the 
priority they are given.  However, I did notice something interesting about 
the return value from poll():

   On  success,  a positive number is returned, where the number returned
   is the number of structures which have  non-zero  revents  fields  (in
   other  words,  those  descriptors  with events or errors reported).

It appears that poll() can return that one or more of the file descriptors 
have events on them that are ready to be handled, but ast_waitfor_nandfds() 
only returns a single "winner."  Is ast_waitfor_nandfds() properly handling 
the case of multiple file descriptors signaling?  Forgive me if that is a 
naive question.  I've been working on our implementation for about half a 
year on many diverse aspects, but I've only recently started to inspect the 
code.  It is well written and well documented, but some of it is still 
difficult to comprehend without asking questions.

I know first hand that Monitor() introduces a significant I/O bottleneck 
with its synchronous writes.  We solved this a while back by writing the leg 
files to a RAM disk as documented in these locations:

   http://thread.gmane.org/gmane.comp.telephony.pbx.asterisk.user/118497
   http://lists.digium.com/pipermail/asterisk-users/2005-October/127919.html

I'm curious to know if writing to some memory structure such as a queue has 
been considered.  A separate writer thread could then be responsible for 
servicing this queue parallel to Asterisk's handling of the call itself.  We 
have a huge amount of memory to play with in our machine (20 GB), but some 
sort of dynamic queue structure and a writer thread may help on lighter 
servers.

Out of all of this information, one thing seems clear to me.  The servicing 
of the channels in ast_generic_bridge() and the Monitor() related code in 
ast_read() and ast_write() don't play well with each other.  Both ast_read() 
and ast_write() use chan->insmpl and chan->outsmpl to determine if they need 
to call ast_seekstream() to jump the file pointer ahead in one of the leg 
files to keep them synchronized.  It seems that the problem centers around 
ast_seekstream() being called based on an assumption that the chan->insmpl 
and chan->outsmpl variables are being incremented on a one-to-one basis, 
while ast_generic_bridge() calls ast_read() and ast_write() on a schedule 
that is nearly, but not quite, one-to-one.  Every so often, this leads to 
ast_seekstream() being called to fill in the gap for data it assumes is 
missing, but will be received within the next couple of loops of 
ast_generic_bridge().

How the gap in the audio file sounds is dependant on its format.  We're 
using PCM, so for us it's a pop.  I know this problem seems minor, but for 
us it is not.  We are handling inbound calling for multiple clients and they 
will want to hear these recordings on a regular basis.  The pops may lead 
them to question the quality of the calls themselves and could ultimately 
lead to a loss in business for us.  That is why I'm so interested in getting 
rid of them, as they appear in nearly every call.

MixMonitor() seems like it may be better suited to solving this issue with 
some sort of buffer, but until it can initiate recordings out of queues, I 
don't think we can use it.  It's possible that adding the option to 
Monitor() that waits until the call is bridged to begin recording may help 
minimize the issue.  I will look into this tomorrow.

We're using Asterisk for business purposes, but have no intention of hiding 
our implementation.  Once we get all of these little gotchas worked out, I 
believe we'll be the largest single-server installation (~500 
agents/simultaneous calls with digital recording) and an excellent case to 
cite when discussing Asterisk as an inbound call center solution.  Once a 
mature predictive dialer is available, we intend to use Asterisk for our 
outbound calling as well.  This is exciting, because our other options are 
traditional PBXs such as NorTel, but I believe Asterisk can provide the same 
service at a fraction of the cost.

I apologize if I'm a little overzealous in trying to address what are 
relatively minor issues, but I feel as if I've been running a marathon at a 
sprint for the last few months.

Thank you,

Matthew Roth
InterMedia Marketing Solutions
Software Engineer and Systems Developer





More information about the asterisk-dev mailing list