[Asterisk-Dev] Help Debugging Dropped Call Audio - Possibly Fixed

Wed Dec 28 11:14:47 MST 2005

All,

This post is an attempt to do address any issues that I missed over the 
weekend.

Kevin P. Fleming wrote:

 > 1) Does the problem happen when recording is not in place?
I *believe* it does, but I'm going to resolve the issue when recording 
is in place and work my way
backwards.  It's much easier to address parts of the problem 
individually when approaching it this way.

 > 2) Does the problem happen when there is no transcoding going
 > on (between the phones or to the recording file)?
Yes.  We are trying to scale up as large as possible so we perform no 
transcoding or DSP on the Asterisk server itself.  The calls are in the 
u-Law codec (all other codecs are noloaded) and are being recorded to 
the PCM format.

 > 3) Does the problem happen if the phones are allowed to talk
 > directly to each other (SIP media path re-INVITE)?
I don't know, but I'd assume it doesn't since Asterisk wouldn't perform 
any bridging and our research indicates that the network itself is not 
an issue.  Of course, removing Asterisk from the audio path would 
eliminate the ability to record the calls, so I probably won't look into 
this until after the Monitor()-related problems are resolved.  If you 
have a specific interest in knowing the answer to this question, I can 
adjust my plans.  Alternatively, our test system is a day from arrival 
at Digium (one of our test servers, a switch, and two VoIP phones), so 
your tech team could investigate this in parallel to my research.

 > 4) What sort of disk subsystem and filesystem are being used
 > to capture the recording files?
A RAM Disk, formatted to the ext2 filesystem using 1KB blocks.

Our production server is a Dell PowerEdge 6850.  It has two 73 GB SCSI 
drives configured in a RAID 1 and an onboard PERC 4e RAID controller.  
The megaraid drivers are up to date (2.20).  We are using the ext3 
filesystem on these disks.

Our test server will be arriving shortly, so in the interest of brevity 
I'm omitting the details about it.  Would you like to know any other 
specific information about our production server?  Both experience the 
same problem.

 > Writing to a RAM disk helps, but it does _not_ reduce the overhead
 > completely. You are still using ext2, which means it needs to do
 > filesystem manipulations to create/remove/extend files, the same as it
 > would on a regular disk system. They will take slightly less time, but
 > not greatly so... keep in mind that with the amount of RAM you have, a
 > regular on-disk filesystem would be mostly in cache anyway (the metadata
 > parts) and would perform nearly the same as a RAM disk.

I was under the impression that the physical movement of the disk heads 
was our bottleneck, and that writing to RAM would be much faster.  I 
understand your point about cache, but we did see a significant increase 
in the number of calls that we could record concurrently once we 
implemented the RAM disk.  Why do you think that is?  Does a synchronous 
write return after the data is written to cache or does it wait until 
it's committed to disk?  How does a journalizing filesystem, such as 
ext3, effect this?

Do you still believe that Monitor() and its synchronous writes are the 
likely source of the problem?  If so, would moving to MixMonitor() solve 
them?

 > For experimentation purposes, I would try tmpfs instead of ext2, and
 > don't actually create a ramdisk device at all... just mount tmpfs (with
 > a size limit) at your /var/spool/asterisk/monitor directory (or wherever
 > you are recording).

Thanks for the suggestion.  I'll have to look into the differences 
between tmpfs and our current method and try some experiments.  If 
nothing else, this project is keeping me busy.  = )

 > You are correct that only a single 'winner' is returned, but the next
 > call into ast_waitfor() will still find the non-winner as 'ready' and
 > will return it. The file descriptors will stay marked 'ready' until the
 > data has been read, regardless of how many times poll() is called on 
them.

I'm going to focus my research on ast_generic_bridge(), 
ast_waitfor_nandfds(), ast_read(), and ast_write().  I'll look primarily 
at the effects of the channel that is chosen for reading on the quality 
of the digital recording.  I think the most likely area for improvement 
is in the code that triggers the seeks.  It may be a bit aggressive.  I 
understand that Monitor() may be deprecated in favor of MixMonitor() in 
the near future, but this may also lead to an improvement in the 
bridging logic.  At the least, it's improving my understanding of what 
Asterisk is doing under the hood.

 > Have you tried recording in another format which may be more resistant
 > to these seeks? It would be interesting to see if there is a difference.

Yes.  GSM interprets the gaps as a small (sometimes imperceptible) 
silence.  I don't know about the other codecs.  Plugging specific data 
into the gaps based on the codec being used, instead of just seeking 
ahead, may mask the problem more effectively.  Unfortunately, we're 
trying to avoid transcoding every recording and we'd like to stick with 
the u-Law codec to keep the quality of the call audio as good as 
possible.  If you have any suggestions for a compromise, I'd be 
interested in hearing them.

 > We all are very interested in making sure that Asterisk performs as well
 > as it can in every situation where it makes sense... Your ability to
 > diagnose and follow the code has been helpful, since you now understand
 > where the problems are coming from :-)

Thank you,

Matthew Roth
InterMedia Marketing Solutions
Software Engineer and Systems Developer