[asterisk-dev] Suggestion on Packet Loss Concealment Algorithm

Fri May 19 10:50:42 MST 2006

Hi Steve,

inline

On 5/19/06, Steve Underwood <steveu at coppice.org> wrote:
>
> ColinZuo at viatech.com.cn wrote:
>
> > Hi,
> >
> > The theory is not based on the music, it's based on that given by the
> > ITU G.711 Appendix I (BTW: the music is converted to 8K/mono/16bit by
> > CoolEdit).
> >
> What works well for music is very different from what works well for
> voice.

yeah,  but i don't think the difference is so big unless you give me a voice
file to prove me wrong.
And again the reason i prolong it based on theory given by G.711 Appendix I,
which is said to be
derived from experimentation of BELL.

G.711 Appendix 1 and my code fade to silence over 50ms. For music
> much greater sustain to fill in the gaps works much better. With speech,
> that badly affects intelligibility.

I didn't change this, BTW, G.711 Appendix I fade to silence over 60ms
because  it doesn't
fade for the first erasure but you did and i think  as you can't know  the
wave are going to
rise or down you'd better keep the same level for the first erasure.

////////////////////////////////////////////////////////////////////////////////////////////////
G.711 Appendix I
I.2.4 Synthetic signal generation for first 10 ms
For the first 10 ms of the erasure, the best results are obtained by
generating the synthesized signal
from the last pitch period with no attenuation.
/////////////////////////////////////////////////////////////////////////////////////////////////////

I used the Appendix 1 approach
> without experimenting. I suspect something other than linear attenuation
> would behave better.

By experimentation, i think as long as the algorithm aimed at Generic Linear
concealment,
probably you cann't find one much better than this, unless you analyse some
voice parameters from
previous samples.

> And the current plc algorithm is similar to the G.711 Appendix I except:
> > 1. The pitch detection algorithm : G.711 Appendix I uses cross
> > correlation, but Asterisk uses AMDF which is simpler and also performs
> > well
> >
> Correct.
>
> > 2. The OLA window: G.711 update the OLA window length when burst loss
> > occurs, but Asterisk didn't
> >
> Wrong. They both use the same OLA strategy - 1/4 pitch period overlap.

G.711 will prolong the OLA window by 4ms  until it reached 10ms, but  the
Asterisk one doesn't?

////////////////////////////////////////////////////////////////////////////////////////////////
G.711 Appendix I
I.2.7 First good frame after an erasure
At the first good frame after an erasure, a smooth transition is needed
between the synthesized
erasure speech and the real signal. To do this, the synthesized speech from
the pitch buffer is
continued beyond the end of the erasure, and then mixed with the real signal
using an OLA. The
length of the OLA depends on both the pitch period and the length of the
erasure. For short, 10 ms
erasures, a 1/4 wavelength window is used. For longer erasures the window is
increased by 4 ms per
10 ms of erasure, up to a maximum of the frame size, 10 ms.
////////////////////////////////////////////////////////////////////////////////////////////////

> 3. The nearby field of the first erasure: G.711 delays the output for
> > 3.75 ms to compensate the probable loss, but Asterisk just use the
> > symmetrical
> >
> > part before the lost to do the OLA. The one G.711 Appendix I utilized
> > should be better, but it's not very important as human being's ears
> > are really anti-jamming.
> >
> That 3.75ms delay is so the Appendix 1 algorithm can do a 1/4 pitch
> period of OLA when erasure commences. However, it incurs lots of buffer
> copying when there are no lost packets. What my code does is time
> reverse the last 1/4 pitch period and OLA with that. It sounds nasty,
> but listening tests with speech showed it was very close to the sound of
> the G.711 appendix 1 algorithm, and improves efficiency a lot in the
> common case - no packets being lost.

Yeah, the result are similar, but the difference is just 3.75 ms delay,  i
didn't see
more buffer copying than necessary,  both algorithm save the same history
(although G.711 keeps
a longer one and delay for 3.75ms)
BTW: packet loss is very common at least in China, and the burst loss can
last very long.
For example, as the bandwith between the two major carriers are very low,
two user from each
will experience packet loss very often if they use the public internet not
some softswitch network.

> 4. whether prolong the pitch period during burst loss: G.711 Appendix
> > I prolong the pitch period to a maximum of 3 pitch period, but
> > Asterisk only uses one which
> >
> > saves memory but behave bad at burst loss.
> >
> For ptolonged erasures G.711 Appendix 1 and my code act in exactly the
> same way. They linearly attenuate to zero over the first 50ms. In that
> period they repeat the last 1.25 pitch periods of real speech, with a
> quarter pitch period of overlap. When real speech restarts they both do
> a 1/4 pitch period of OLA, based on the last known pitch. The algorithms
> are identical beyond the initial 1/4 pitch period of OLA. Why would
> anyone want to save memory here? It only uses a small amount. The
> algorithmic changes were to reduce the buffer manipulation in the common
> case.

Not the same.

////////////////////////////////////////////////////////////////////////////////////////////////
G.711 Appendix I
I.2.5 Synthetic signal generation after 10 ms
If the next frame is also erased, the erasure will be at least 20 ms long
and further action is required.
While repeating a single pitch period works well for short erasures (e.g. 10
ms), on long erasures it
introduces unnatural harmonic artifacts (beeps). This is especially
noticeable if the erasure lands in
an unvoiced region of speech, or in a region of rapid transition such as a
stop. It was discovered by
experimentation that these artifacts are significantly reduced by increasing
the number of pitch
periods used to synthesize the signal as the erasure progresses. Playing
more pitch periods increases
the variation in the signal. Although the pitch periods are not played in
the order they occurred in the
original signal, the resulting output still sounds natural. At 10 ms into
the erasure the number of pitch
periods used to synthesize the speech is increased to two, and at 20 ms a
third pitch period is added.
For erasures longer than 20 ms no additional modifications to the pitch
buffer are made.
////////////////////////////////////////////////////////////////////////////////////////////////

I think the documentation for my PLC code is missing from the Asterisk

No, it's available  in plc.h under  asterisk/include. :)

source code, but you can find it at
> http://www.soft-switch.org/spandsp-doc/plc_page.html
>
> Regards,
> Steve
>
> _______________________________________________
> --Bandwidth and Colocation provided by Easynews.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-dev/attachments/20060519/322ad8c1/attachment.htm