[Asterisk-Dev] app_echo, "voice timestamp prediction", and long
latency..
Steve Kann
stevek at stevek.com
Wed Dec 22 10:55:39 MST 2004
Steve Kann wrote:
>
> This is of general asterisk/IAX2 interest, although I discovered it
> chasing down reports and issues I've seen testing the new iaxclient
> jitterbuffer against asterisk:
>
> In debugging some issues I've seen with the new jitterbuffer
> (currently, I'm working on this in iaxclient, but I'd like this to get
> into asterisk as well), I've run into an issue with app_echo.
> The setup is that I have an iaxclient calling asterisk, and running
> app_echo on asterisk. The issue is that occasionally, the timestamps
> coming from asterisk will have a big (~640ms) jump forwards in time.
> i.e. I'll get timestamps like this:
>
> n n+20 n+40 n+657 n+677 n+697, etc.
>
> iaxclient deals with this as we'd expect: It thinks it lost all those
> packets in the middle, plays 640ms (+-) of silence, then eventually
> shrinks it's jitterbuffer to match the new timebase. No _big_ harm
> done, as audio isn't _lost_, but the lag increases by about 640ms
> until it shrinks back down.
>
> This is caused by two things happening:
>
> 1) app_echo does _not_ reflect timestamps back to the caller, as you'd
> expect.
> 47 /* Do our thing here */
> 48 while(ast_waitfor(chan, -1) > -1) {
> 49 f = ast_read(chan);
> 50 if (!f)
> 51 break;
> 52 f->delivery.tv_sec = 0;
> 53 f->delivery.tv_usec = 0;
> 54 if (f->frametype == AST_FRAME_VOICE) {
> 55 if (ast_write(chan, f))
> 56 break;
>
>
>
> actually, in my opinion, app_echo is one of the apps that shouldn't
> actually use the incoming jitterbuffer at all, but should just bounce
> frames back at you, and it should use the same timestamps it gets. If
> you've come in from an VoIP channel, you'll get dejittered on the
> other end, and if you came in from a local channel (soundcard or PSTN
> interface), your channel driver should be able to deal with the
> "internal jitter" (scheduling latency), just fine.
>
> 2) In chan_iax2.c, we do this, on "predicted" voice timestamps going out:
> 330
> 331 #define MAX_TIMESTAMP_SKEW 640
> 332
>
> 2850 if (voice) {
> 2851 /* On a voice frame, use predicted
> values if appropriate */
> 2852 if (abs(ms - p->nextpred) <=
> MAX_TIMESTAMP_SKEW) {
> 2853 if (!p->nextpred) {
> 2854 p->nextpred = ms;
> /*f->samples / 8;*/
> 2855 if (p->nextpred <=
> p->lastsent)
> 2856 p->nextpred =
> p->lastsent + 3;
> 2857 }
> 2858 ms = p->nextpred;
> 2859 } else
> 2860 p->nextpred = ms;
> 2861 } else {
>
> I've actually changed this (well, the equivalent area) in libiax2,
> based on conversations from astricon, to do this:
>
> 495 if(voice) {
> 496 #ifdef USE_VOICE_TS_PREDICTION
> 497 /* If we haven't most recently sent silence,
> and we're
> 498 * close in time, use predicted time */
> 499 if(session->notsilenttx && abs(ms -
> session->nextpred) <= 240) {
> 500 /* Adjust our txcore, keeping voice and
> non-voice
> 501 * synchronized */
> 502 add_ms(&session->offset, (int)(ms -
> session->nextpred)/10);
> 503
> 504 if(!session->nextpred)
> 505 session->nextpred = f->samples;
> 506 ms = session->nextpred;
> 507 } else {
> 508 /* in this case, just use the actual time,
> since
> 509 * we're either way off (shouldn't happen),
> or we're
> 510 * ending a silent period -- and seed the
> next predicted
> 511 * time. Also, round ms to the next
> multiple of
> 512 * frame size (so our silent periods are
> multiples
> 513 * of frame size too) */
> 514 int diff = ms % (f->samples / 8);
> 515 if(diff)
> 516 ms += f->samples/8 - diff;
> 517 session->nextpred = ms;
> 518 }
> 519 #else
> 520 if(ms <= session->lastsent)
> 521 ms = session->lastsent + 3;
> 522 #endif
> 523 session->notsilenttx = 1;
> 524 } else {
>
>
> Basically, what this does, is the following:
>
> 1) it keeps track of silent periods, based on sending an AST_FRAME_CNG
> to indicate silence. This is something we ought to do throughout
> asterisk, so we can do DTX properly everywhere. Things still
> generally work without this optimization, though.
>
> 2) When we're not in a silent period, and we are "predicting"
> timestamps, we adjust our "txcore" (via a low-pass filter), so that we
> can continually send continuous, clean timestamps, even if the
> timebase of our audio, and gettimeofday don't match.
>
>
> ================
>
> In this case, both iaxclient and the asterisk box are running ntp, and
> I've seen the skew on the iaxclient side is tiny (due to some
> under/overruns from the input side of the audio). The problem
> actually happens because there's some packet loss. After a few (maybe
> 5), we get to the point where we've dropped 32 frames (out of about
> 15,000). (they could have been lost, or late).
>
> Since app_echo erases the received timestamp, it has no way to know or
> tell chan_iax2 that it isn't sending contiguous frames, and the
> "predictor" logic chugs along, until the "predicted" timestamps are
> 640 less than real time, and then we get the big jump which buggers
> things up.
>
> A short-term, simple, solution to this would be:
>
> a) adopt the txcore-follows-prediction logic from libiax2 to chan_iax2,
> and/or
Just as an FYI: (a) has been added to asterisk-CVS. bug #3119.
-SteveK
More information about the asterisk-dev
mailing list