[Asterisk-Dev] app_echo, "voice timestamp prediction",
and long latency..
Steve Kann
stevek at stevek.com
Mon Dec 20 16:40:10 MST 2004
This is of general asterisk/IAX2 interest, although I discovered it
chasing down reports and issues I've seen testing the new iaxclient
jitterbuffer against asterisk:
In debugging some issues I've seen with the new jitterbuffer (currently,
I'm working on this in iaxclient, but I'd like this to get into asterisk
as well), I've run into an issue with app_echo.
The setup is that I have an iaxclient calling asterisk, and running
app_echo on asterisk. The issue is that occasionally, the timestamps
coming from asterisk will have a big (~640ms) jump forwards in time.
i.e. I'll get timestamps like this:
n n+20 n+40 n+657 n+677 n+697, etc.
iaxclient deals with this as we'd expect: It thinks it lost all those
packets in the middle, plays 640ms (+-) of silence, then eventually
shrinks it's jitterbuffer to match the new timebase. No _big_ harm
done, as audio isn't _lost_, but the lag increases by about 640ms until
it shrinks back down.
This is caused by two things happening:
1) app_echo does _not_ reflect timestamps back to the caller, as you'd
expect.
47 /* Do our thing here */
48 while(ast_waitfor(chan, -1) > -1) {
49 f = ast_read(chan);
50 if (!f)
51 break;
52 f->delivery.tv_sec = 0;
53 f->delivery.tv_usec = 0;
54 if (f->frametype == AST_FRAME_VOICE) {
55 if (ast_write(chan, f))
56 break;
actually, in my opinion, app_echo is one of the apps that shouldn't
actually use the incoming jitterbuffer at all, but should just bounce
frames back at you, and it should use the same timestamps it gets. If
you've come in from an VoIP channel, you'll get dejittered on the other
end, and if you came in from a local channel (soundcard or PSTN
interface), your channel driver should be able to deal with the
"internal jitter" (scheduling latency), just fine.
2) In chan_iax2.c, we do this, on "predicted" voice timestamps going out:
330
331 #define MAX_TIMESTAMP_SKEW 640
332
2850 if (voice) {
2851 /* On a voice frame, use predicted
values if appropriate */
2852 if (abs(ms - p->nextpred) <=
MAX_TIMESTAMP_SKEW) {
2853 if (!p->nextpred) {
2854 p->nextpred = ms;
/*f->samples / 8;*/
2855 if (p->nextpred <=
p->lastsent)
2856 p->nextpred =
p->lastsent + 3;
2857 }
2858 ms = p->nextpred;
2859 } else
2860 p->nextpred = ms;
2861 } else {
I've actually changed this (well, the equivalent area) in libiax2, based
on conversations from astricon, to do this:
495 if(voice) {
496 #ifdef USE_VOICE_TS_PREDICTION
497 /* If we haven't most recently sent silence, and
we're
498 * close in time, use predicted time */
499 if(session->notsilenttx && abs(ms -
session->nextpred) <= 240) {
500 /* Adjust our txcore, keeping voice and
non-voice
501 * synchronized */
502 add_ms(&session->offset, (int)(ms -
session->nextpred)/10);
503
504 if(!session->nextpred)
505 session->nextpred = f->samples;
506 ms = session->nextpred;
507 } else {
508 /* in this case, just use the actual time, since
509 * we're either way off (shouldn't happen),
or we're
510 * ending a silent period -- and seed the
next predicted
511 * time. Also, round ms to the next multiple of
512 * frame size (so our silent periods are
multiples
513 * of frame size too) */
514 int diff = ms % (f->samples / 8);
515 if(diff)
516 ms += f->samples/8 - diff;
517 session->nextpred = ms;
518 }
519 #else
520 if(ms <= session->lastsent)
521 ms = session->lastsent + 3;
522 #endif
523 session->notsilenttx = 1;
524 } else {
Basically, what this does, is the following:
1) it keeps track of silent periods, based on sending an AST_FRAME_CNG
to indicate silence. This is something we ought to do throughout
asterisk, so we can do DTX properly everywhere. Things still generally
work without this optimization, though.
2) When we're not in a silent period, and we are "predicting"
timestamps, we adjust our "txcore" (via a low-pass filter), so that we
can continually send continuous, clean timestamps, even if the timebase
of our audio, and gettimeofday don't match.
================
In this case, both iaxclient and the asterisk box are running ntp, and
I've seen the skew on the iaxclient side is tiny (due to some
under/overruns from the input side of the audio). The problem actually
happens because there's some packet loss. After a few (maybe 5), we get
to the point where we've dropped 32 frames (out of about 15,000). (they
could have been lost, or late).
Since app_echo erases the received timestamp, it has no way to know or
tell chan_iax2 that it isn't sending contiguous frames, and the
"predictor" logic chugs along, until the "predicted" timestamps are 640
less than real time, and then we get the big jump which buggers things up.
A short-term, simple, solution to this would be:
a) adopt the txcore-follows-prediction logic from libiax2 to chan_iax2,
and/or
b) have app_echo reflect rxstamps back
In the long term, (c) adopting the jitterbuffer mechanism I'm working on
in iaxclient/libiax2 would also solve this, because it generates
interpolation frames when it finds lost/late frames, so you still get
the right number of frames passing through.
I'm pretty sure (a) is the right thing to do in all cases, though, and
should go in soon.
I'm not sure why mark added (b) to app_echo The CVS log just says this:
2004-03-27 01:50 markster
* app_echo.c: Make read/write mode have a lock parameter and use it
properly.
(c) is probably a bit off, because integrating it into asterisk will
take a bit of work.
-SteveK
More information about the asterisk-dev
mailing list