[asterisk-dev] Associating IAX2 audio frames with a particular call?

Sun May 18 11:59:02 CDT 2008

On 18 May 2008, at 17:12, Dan Mills wrote:

> On Sun, 2008-05-18 at 11:47 -0400, Jay R. Ashworth wrote:
>
>>
>>> Now, another telephony question, does anyone have some good ASRC  
>>> code
>>> out there?
>>> I have the audio interface running at one rate and the telephony bit
>>> running at another (and they are not phase locked), so some  
>>> buffering
>>> and clever resampling will be needed, but the delay locked control  
>>> loop
>>> is giving me fits at the moment.
>>
>> Libsamplerate gets, I gather, mixed reviews.  Is it on point?
>
> Libsamplerate is fine as far as it goes, but I have two essentially
> asynchronous processes (The telephony operating at 8Khz sr, and
> delivering 160 samples nominally every 20ms), and the output system
> running at 44.1, 48 or whatever delivering 128, 256, 512.... or  
> whatever
> power of two samples at whatever rate the sound card clock is actually
> running at.
>
> The problem is deriving what the real ratio is so that the resampler  
> can
> be setup correctly (And this ratio is time varying). It is  
> essentially a
> control theory problem and is more then a little hairy.
>
> I am sort of tempted to first resample the telephony to the nominal
> output rate, then stick that into a ring buffer. Then as part of the
> jackd callback (which should be low jitter) run a servo loop to  
> resample
> a second time by a ratio that is almost (but probably not exactly  
> 1.0),
> so as to avoid over or under run, but it feels horribly inelegant.
>
> So far I have not seen the source for a softphone that I consider to
> really get this right (Most use masking to fill in for underrun or  
> just
> drop frames to compensate for overrun, that wont fly for this
> application).

We used an algorithm that drops (or duplicates) a single sample when
needed. (as opposed to a whole frame) Provided you randomize where
in a frame you do this it is (to my ears) inaudible.

The hard part is to get the ratio reasonably accurate, since both
the arrival time of the packets and the consumption of the audio data
are subject to jitter (for network and scheduling reasons respectively).
The only way we could get close to working is counting bytes over  
several
seconds.

Meetme takes the view that the only way to reduce the jitter is to
do the audio processing on a hardware interrupt (basically zero jitter).

You might want to look at some of the stuff Apple are doing in  
AudioUnits,
they have a nice low jitter audio processing environment .

If you are taking your telephony from the PSTN over PRI you should
have pretty accurate (and low drift) clocks. Anything else (especially  
softphones),
and you have to expect a percent or so inaccuracy and a fair bit of  
drift as
the devices heat up and cool down.

Tim.