[Asterisk-Dev] app_echo, "voice timestamp prediction", and long latency..

Mon Dec 20 16:40:10 MST 2004

This is of general asterisk/IAX2 interest, although I discovered it 
chasing down reports and issues I've seen testing the new iaxclient 
jitterbuffer against asterisk:

In debugging some issues I've seen with the new jitterbuffer (currently, 
I'm working on this in iaxclient, but I'd like this to get into asterisk 
as well), I've run into an issue with app_echo. 

The setup is that I have an iaxclient calling asterisk, and running 
app_echo on asterisk.  The issue is that occasionally, the timestamps 
coming from asterisk will have a big (~640ms) jump forwards in time.  
i.e. I'll get timestamps like this:

n n+20 n+40 n+657 n+677 n+697, etc.

iaxclient deals with this as we'd expect:  It thinks it lost all those 
packets in the middle, plays 640ms (+-) of silence, then eventually 
shrinks it's jitterbuffer to match the new timebase.  No _big_ harm 
done, as audio isn't _lost_, but the lag increases by about 640ms until 
it shrinks back down.

This is caused by two things happening:

1) app_echo does _not_ reflect timestamps back to the caller, as you'd 
expect.  

     47         /* Do our thing here */
     48         while(ast_waitfor(chan, -1) > -1) {
     49                 f = ast_read(chan);
     50                 if (!f)
     51                         break;
     52                 f->delivery.tv_sec = 0;
     53                 f->delivery.tv_usec = 0;
     54                 if (f->frametype == AST_FRAME_VOICE) {
     55                         if (ast_write(chan, f))
     56                                 break;

actually, in my opinion, app_echo is one of the apps that shouldn't 
actually use the incoming jitterbuffer  at all, but should just bounce 
frames back at you, and it should use the same timestamps it gets.  If 
you've come in from an VoIP channel, you'll get dejittered on the other 
end, and if you came in from a local channel (soundcard or PSTN 
interface), your channel driver should be able to deal with the 
"internal jitter" (scheduling latency), just fine.

2) In chan_iax2.c, we do this, on "predicted" voice timestamps going out:
    330
    331 #define MAX_TIMESTAMP_SKEW      640
    332

  2850                 if (voice) {
   2851                         /* On a voice frame, use predicted 
values if appropriate */
   2852                         if (abs(ms - p->nextpred) <= 
MAX_TIMESTAMP_SKEW) {
   2853                                 if (!p->nextpred) {
   2854                                         p->nextpred = ms; 
/*f->samples / 8;*/
   2855                                         if (p->nextpred <= 
p->lastsent)
   2856                                                 p->nextpred = 
p->lastsent + 3;
   2857                                 }
   2858                                 ms = p->nextpred;
   2859                         } else
   2860                                 p->nextpred = ms;
   2861                 } else {

I've actually changed this (well, the equivalent area) in libiax2, based 
on conversations from astricon, to do this:

    495         if(voice) {
    496 #ifdef USE_VOICE_TS_PREDICTION
    497                 /* If we haven't most recently sent silence, and 
we're
    498                  * close in time, use predicted time */
    499                 if(session->notsilenttx && abs(ms - 
session->nextpred) <= 240) {
    500                     /* Adjust our txcore, keeping voice and 
non-voice
    501                      * synchronized */
    502                     add_ms(&session->offset, (int)(ms - 
session->nextpred)/10);
    503
    504                     if(!session->nextpred)
    505                         session->nextpred = f->samples;
    506                     ms = session->nextpred;
    507                 } else {
    508                     /* in this case, just use the actual time, since
    509                      * we're either way off (shouldn't happen), 
or we're
    510                      * ending a silent period -- and seed the 
next predicted
    511                      * time.  Also, round ms to the next multiple of
    512                      * frame size (so our silent periods are 
multiples
    513                      * of frame size too) */
    514                     int diff = ms % (f->samples / 8);
    515                     if(diff)
    516                         ms += f->samples/8 - diff;
    517                     session->nextpred = ms;
    518                 }
    519 #else
    520                 if(ms <= session->lastsent)
    521                         ms = session->lastsent + 3;
    522 #endif
    523                 session->notsilenttx = 1;
    524         } else {

Basically, what this does, is the following:

1) it keeps track of silent periods, based on sending an AST_FRAME_CNG 
to indicate silence.  This is something we ought to do throughout 
asterisk, so we can do DTX properly everywhere.  Things still generally 
work without this optimization, though.

2) When we're not in a silent period, and we are "predicting" 
timestamps, we adjust our "txcore" (via a low-pass filter), so that we 
can continually send continuous, clean timestamps, even if the timebase 
of our audio, and gettimeofday don't match.

================

In this case, both iaxclient and the asterisk box are running ntp, and 
I've seen the skew on the iaxclient side is tiny (due to some 
under/overruns from the input side of the audio).  The problem actually 
happens because there's some packet loss.  After a few (maybe 5), we get 
to the point where we've dropped 32 frames (out of about 15,000).  (they 
could have been lost, or late).

Since app_echo erases the received timestamp, it has no way to know or 
tell chan_iax2 that it isn't sending contiguous frames, and the 
"predictor" logic chugs along, until the "predicted" timestamps are 640 
less than real time, and then we get the big jump which buggers things up.

A short-term, simple, solution to this would be:

a) adopt the txcore-follows-prediction logic from libiax2 to chan_iax2,
and/or
b) have app_echo reflect rxstamps back

In the long term, (c) adopting the jitterbuffer mechanism I'm working on 
in iaxclient/libiax2 would also solve this, because it generates 
interpolation frames when it finds lost/late frames, so you still get 
the right number of frames passing through. 

I'm pretty sure (a) is the right thing to do in all cases, though, and 
should go in soon. 

I'm not sure why mark added (b) to app_echo  The CVS log just says this:

2004-03-27 01:50  markster

        * app_echo.c: Make read/write mode have a lock parameter and use it
          properly.

(c) is probably a bit off, because integrating it into asterisk will 
take a bit of work.

-SteveK