[Asterisk-Dev] Re: [Iaxclient-devel] jitterbuffer

Wed Apr 13 10:34:28 MST 2005

Jesse Kaijen wrote:

> Hi,
>
> Basic definition for reference: "Jitter is a measure of the 
> variability over time of the latency across a network. A very low 
> amount of jitter is important for real-time applications using voice 
> and video."

> Jitter calculation is the standard deviation of the historical delay 
> of packets during the conversation.

> Because calculating std. dev. is very cpu intensive, especially when 
> you have to predict which curve you're having (Gauss, Poisson, etc), 
> the jitter may be approximated by taking the IQR (inter quartile range).

In the case of "jitterbuf.c", the definition you're using above 
necessarily helpful. I don't know where you are getting these 
authoritative definitions from, but regardless of whether it may be 
defined that way somehow, what we really need to do is to predict the 
delay of the future packets based on the delays we see in previous packets.

Much of what I've done is based on this paper and it's references (The 
"e-mos" paper) http://tinyurl.com/6ylww

>
> I am sending this to the asterisk-dev as well because the underlying 
> implementation is currently also being worked into Asterisk (-head). 
> Hopefully my notes will help all implementations evolve.
>
> ----- Original Message ----- From: "Steve Kann" <stevek at stevek.com>
>
>> OK, I'm trying to understand what you're doing here. Here's some 
>> comments/questions..
>>
>> Jesse Kaijen wrote:
>>
>>> Hello,
>>>
>>> I did some modifications on the jitterbuffer. Please put it to the 
>>> test, here is an sort overview of the changes:
>>>
>>> - history is kept in two buckets: bucket 1 keeps actual history and 
>>> bucket 2 keeps a sorted list of bucket 1.
>>>
>>> - jitter is calculated as being the Inter Quartile Range of bucket 
>>> 2. (more RFC-like then the jitter calc at the moment).*
>>
>>
>> Being "more RFC-like" isn't really the goal of the jitterbuffer, 
>> though. It might be a goal for stats, (for SIP at least), where we 
>> report jitter in RTCP RR's the way the RTP RFC specifies. Also, we're
>
>
> I thought that IAX2 can make use of that too, within the PONG (as in 
> iax_send_pong). 

Yes, it does use this. It presently uses the result of the "jitter" 
calculation done in the jitterbuf. This, of course, could change to use 
any other measure of jitter; I think a useful value to use here is the 
actual max-min change in delay over a measurement period; Then, you can 
meaningfully compare "delay" and "jitter", and see how the jitterbuffer 
is delaying your packets less than the absolute jitter.

> Also the E-MOS uses the std.dev.

The E-MOS paper calculates it's measure of what we are calling "jitter", 
by using the same method they describe in their "Loss Control" playout 
buffer paper. There, for each frame, the estimate a delay (which we'd 
call "max") based on the previous 500 frames, fitted into a pareto 
distribution estimation function, with a specified loss percentage.

I spent a lot of time trying to code this exact thing, and I found that 
it seems to be much less expensive to just actually calculate the same 
thing based on the history, rather than to put it into this distribution 
and then estimate it. But either way, what you're trying to do is look 
at the previous "n" packets, and find a "max" where n*(losspct/100) 
frames are above the max, and n*(100-losspct)/100 are below the threshold.

>> less interested in measuring historical jitter in any meaningful way 
>> than we are in predicting future delays. But for the jitterbuffer, 
>> the calculation's goal is really to predict future delays.
>>
> A prediction of future delays can be done in a meaningful way if the 
> actual jitter is known.

You keep using the words "actual jitter" here, like the definitions you 
got above were something written in stone. If you look at the research, 
there's lots of methods that can be used to try to predict the future 
delays based on past delays, and each have cases where they work well, 
and cases where they don't.

>
>> So, it seems what you're doing, is from your sorted history buffer, 
>> you're throwing away the earliest 25% of delays, and the last 25% of 
>> delays, and calling that "jitter", and then calling "min" the median 
>> delay.
>>
>> Later, you're setting the jitterbuffer target to (jitter)*5/2 + min + 
>> JB_TARGET_EXTRA;
>>
>> The only thing that's real important for the jitterbuffer is how you 
>> set target. So, let's compare the two:
>>
>> present implementation: jb->info.jitter + jb->info.min + 
>> JB_TARGET_EXTRA;
>> = max - min + min + EXTRA
>> = max + extra
>> = 96%ile + extra (since we throw away 4% of max) *1
>>
>> This is designed to approximate the E-MOS calculation, by using a 
>> constant EXTRA over the straight "loss-control" method.
>>
>
> Indeed the E-MOS calculation says take the
> middle+ 2*std.dev. + extra =~ 96% + extra

Where is this equation? I can't find this..

>
> you take 96% of the history, this is not equal to 96% of the E-MOS 
> calculation.

I take a number which is (exactly) the number which, if future packets 
looks exactly like the present packets, will result in 4% of packets 
being late.

>
> where your extra = 40ms. (why 40ms??)

Then, I add 40ms, and this is where I've greatly simplified the E-MOS 
method. If you look at the curves that they're showing, you'll see when 
the delay is low (< 200ms), they use a low losspct, and when delay is 
higher (> 200ms), they use a much higher losspct.

using the losspct = 4, and then setting the delay to be 40ms higher than 
that seems to fit that same model, because when jitter is low, the 40ms 
extra will generally cover almost all of the frames that are in the 4% 
range that we are discarding. However, when jitter is high, the 40ms 
constant has much less meaning, and as jitter goes much higher than 
200ms, we will approach an actual losspct of 4%.

> Especially if that is in every *-box jitterbuffer. Sometimes there are 
> 3 or 4 *-boxes in a path and the delay is increased by more then 
> 40*3=120ms 40*4=160ms. Not needed delay... E-MOS states that with 
> longer delay the audioperceptive will drop. So why add more potential 
> unnessecary delay?

We don't. The jitterbuffer only runs on the last hop, it is 
automatically disabled when a call is bridged from VoIP to VoIP in asterisk.

To quote from the paper:
====
 From Figure 1, we can observe that the effect of
introducing the playout delay is quite limited when the delay
is small (less than 200 msec). In this region, it is effective to
prevent packet loss by lengthening the playout delay. However,
as the one-way delay becomes larger, E-MOS tries to
intentionally accommodate the increasing PLR to reduce the
playout delay. This is a good solution to improve the users’
perceived quality.
===

Their curves are there to draw the same conclusions.

>>
>> your implementation:
>> jb->info.target = (jb->info.jitter)*5/2 + jb->info.min + 
>> JB_TARGET_EXTRA;
>> = (75%ile - 25%ile) * 5/2 + 50%ile + extra
>>
>>
>> I'm not sure I understand the logic behind this..
>
>
> target = jitter*5/2 + jb->info.min + JB_TARGET_EXTRA;
>
> = jitter*2 + middle + 1/2*jitter + extra
> =~ std.dev*2 + middle * 1/2*std.dev + extra
> = 96% + 1/2*std.dev + 0
>
> the 1/2*std.dev. is my extra because I stated JB_TARGET_EXTRA ==0;
> So the extra is taken growing with the jitterbuffer and the unneeded 
> delay is kept low.

My stats are rusty, so I'll take your word on it w.r.t. std deviations, 
but, this all assumes that delays _do_ fit a standard deviation, which 
is questionable.

>>
>> One thing that I do notice, though is that you never take into 
>> account the top 25% of delays at all; That means, if delay is 
>> increasing, you may not notice it for up to 25% of history size 
>> packets (i.e. for your 250-packet history, that's 62.5 packets, or 
>> about 3 seconds.
>>
> first of all 62.5 * 20 = 1250ms
> second the current jitterbuffersize has be played to the end-user 
> entirely before packets will be dropped and the end-user notices.
> third jitter normally doesn't increase explosive, but grows steadily 
> (other than spikes ofcourse, see next).
> fourth a jitterspike is rare and not predictable (spike is only one 
> for explosive increase, see previous ;-))

I have documented cases where there are regular spikes like this, of 
about 200ms, that can occur every 5-10s.

[skipping out-of-order patch]

>>> - JB_HISTORY_SZ is set to 250. This is because a jitter calculation 
>>> is best over 5 - 8 seconds. 250 is 5 seconds with 20ms frames and 
>>> 7,5 seconds with 30ms frames and will cover it. **
>>
>>
>> I chose 500 based on the E-MOS paper.
>>
> ok, do you have the link for me? Can't find it in my history any more.

It's above, or you can type "playout E-MOS" into google, google scholar, 
or siteseer to find it.

>>>
>>> - jb_info is only valid after half JB_HISTORY_SZ.
>>>
>>> - jb->info.target = (jb->info.jitter)*5/2 + jb->info.min + 
>>> JB_TARGET_EXTRA;
>>> with JB_TARGET_EXTRA ==0;
>>> In theory this would be sufficient to handle over 99% of the packets 
>>> and still keeping the jitterbuffer small.
>>
>>
>> see above -- I'd like to understand the derivation of this..
>>
>> In the larger context, it was my understanding that (one of?) your 
>> goals is to use the jitterbuffer to detect congestion, and then 
>> change codecs or something to try to avoid the congestion.
>>
> the jitterbuffer is only for creating the correct stats, a monitor in 
> the application layer will handle codec changes, this because you may 
> prefer fax detection and stuff like that. However, an application like 
> that is only as good as its foundation, hence the work on the 
> underlying jitterbuffer :)
>
>> As I said before, the idea of congestion avoidance (and increasing 
>> call quality by avoiding loss), is interesting, but I don't think 
>> that changing codecs alone is really the best solution, since you 
>> have the constant 12kb/s overhead for sending 50 packets per second. 
>> However, if you do figure out how to detect congestion well, there's 
>> several things you can do:
>>
>> 1) change codecs.
>> 2) Change codec settings for variable/adjustable bitrate codecs (i.e. 
>> speex).
>> 3) Change the packetization interval. (going from 20ms frames to 40ms 
>> frames saves you 6kb/s without touching the payload).
>
>
> This can be done by the monitor as well and I certainly take it in 
> account.
>
>> 4) If you're doing video, you have a lot more bandwidth to reduce.
>>
> That's why I only look at voice. I don't know the requirements of 
> video and I think that loss handling is totally different with video.

Well, you don't want to throw away _any_ frames with video, nor do you 
do any kind of interpolation. But, you can reduce the bandwidth that 
video sends in many ways (generally, by increasing the amount of 
quantization, changing the frame rate, etc).

-SteveK