[Asterisk-Dev] app_echo, "voice timestamp prediction", and long latency..

Wed Dec 22 10:55:39 MST 2004

Steve Kann wrote:

>
> This is of general asterisk/IAX2 interest, although I discovered it 
> chasing down reports and issues I've seen testing the new iaxclient 
> jitterbuffer against asterisk:
>
> In debugging some issues I've seen with the new jitterbuffer 
> (currently, I'm working on this in iaxclient, but I'd like this to get 
> into asterisk as well), I've run into an issue with app_echo.
> The setup is that I have an iaxclient calling asterisk, and running 
> app_echo on asterisk.  The issue is that occasionally, the timestamps 
> coming from asterisk will have a big (~640ms) jump forwards in time.  
> i.e. I'll get timestamps like this:
>
> n n+20 n+40 n+657 n+677 n+697, etc.
>
> iaxclient deals with this as we'd expect:  It thinks it lost all those 
> packets in the middle, plays 640ms (+-) of silence, then eventually 
> shrinks it's jitterbuffer to match the new timebase.  No _big_ harm 
> done, as audio isn't _lost_, but the lag increases by about 640ms 
> until it shrinks back down.
>
> This is caused by two things happening:
>
> 1) app_echo does _not_ reflect timestamps back to the caller, as you'd 
> expect. 
>     47         /* Do our thing here */
>     48         while(ast_waitfor(chan, -1) > -1) {
>     49                 f = ast_read(chan);
>     50                 if (!f)
>     51                         break;
>     52                 f->delivery.tv_sec = 0;
>     53                 f->delivery.tv_usec = 0;
>     54                 if (f->frametype == AST_FRAME_VOICE) {
>     55                         if (ast_write(chan, f))
>     56                                 break;
>
>
>
> actually, in my opinion, app_echo is one of the apps that shouldn't 
> actually use the incoming jitterbuffer  at all, but should just bounce 
> frames back at you, and it should use the same timestamps it gets.  If 
> you've come in from an VoIP channel, you'll get dejittered on the 
> other end, and if you came in from a local channel (soundcard or PSTN 
> interface), your channel driver should be able to deal with the 
> "internal jitter" (scheduling latency), just fine.
>
> 2) In chan_iax2.c, we do this, on "predicted" voice timestamps going out:
>    330
>    331 #define MAX_TIMESTAMP_SKEW      640
>    332
>
>  2850                 if (voice) {
>   2851                         /* On a voice frame, use predicted 
> values if appropriate */
>   2852                         if (abs(ms - p->nextpred) <= 
> MAX_TIMESTAMP_SKEW) {
>   2853                                 if (!p->nextpred) {
>   2854                                         p->nextpred = ms; 
> /*f->samples / 8;*/
>   2855                                         if (p->nextpred <= 
> p->lastsent)
>   2856                                                 p->nextpred = 
> p->lastsent + 3;
>   2857                                 }
>   2858                                 ms = p->nextpred;
>   2859                         } else
>   2860                                 p->nextpred = ms;
>   2861                 } else {
>
> I've actually changed this (well, the equivalent area) in libiax2, 
> based on conversations from astricon, to do this:
>
>    495         if(voice) {
>    496 #ifdef USE_VOICE_TS_PREDICTION
>    497                 /* If we haven't most recently sent silence, 
> and we're
>    498                  * close in time, use predicted time */
>    499                 if(session->notsilenttx && abs(ms - 
> session->nextpred) <= 240) {
>    500                     /* Adjust our txcore, keeping voice and 
> non-voice
>    501                      * synchronized */
>    502                     add_ms(&session->offset, (int)(ms - 
> session->nextpred)/10);
>    503
>    504                     if(!session->nextpred)
>    505                         session->nextpred = f->samples;
>    506                     ms = session->nextpred;
>    507                 } else {
>    508                     /* in this case, just use the actual time, 
> since
>    509                      * we're either way off (shouldn't happen), 
> or we're
>    510                      * ending a silent period -- and seed the 
> next predicted
>    511                      * time.  Also, round ms to the next 
> multiple of
>    512                      * frame size (so our silent periods are 
> multiples
>    513                      * of frame size too) */
>    514                     int diff = ms % (f->samples / 8);
>    515                     if(diff)
>    516                         ms += f->samples/8 - diff;
>    517                     session->nextpred = ms;
>    518                 }
>    519 #else
>    520                 if(ms <= session->lastsent)
>    521                         ms = session->lastsent + 3;
>    522 #endif
>    523                 session->notsilenttx = 1;
>    524         } else {
>
>
> Basically, what this does, is the following:
>
> 1) it keeps track of silent periods, based on sending an AST_FRAME_CNG 
> to indicate silence.  This is something we ought to do throughout 
> asterisk, so we can do DTX properly everywhere.  Things still 
> generally work without this optimization, though.
>
> 2) When we're not in a silent period, and we are "predicting" 
> timestamps, we adjust our "txcore" (via a low-pass filter), so that we 
> can continually send continuous, clean timestamps, even if the 
> timebase of our audio, and gettimeofday don't match.
>
>
> ================
>
> In this case, both iaxclient and the asterisk box are running ntp, and 
> I've seen the skew on the iaxclient side is tiny (due to some 
> under/overruns from the input side of the audio).  The problem 
> actually happens because there's some packet loss.  After a few (maybe 
> 5), we get to the point where we've dropped 32 frames (out of about 
> 15,000).  (they could have been lost, or late).
>
> Since app_echo erases the received timestamp, it has no way to know or 
> tell chan_iax2 that it isn't sending contiguous frames, and the 
> "predictor" logic chugs along, until the "predicted" timestamps are 
> 640 less than real time, and then we get the big jump which buggers 
> things up.
>
> A short-term, simple, solution to this would be:
>
> a) adopt the txcore-follows-prediction logic from libiax2 to chan_iax2,
> and/or

Just as an FYI:  (a) has been added to asterisk-CVS.  bug #3119.

-SteveK