[asterisk-bugs] [Asterisk 0008126]: [patch] G.711 codec woes
noreply at bugs.digium.com
noreply at bugs.digium.com
Fri Aug 10 04:34:51 CDT 2007
A NOTE has been added to this issue.
======================================================================
http://bugs.digium.com/view.php?id=8126
======================================================================
Reported By: fossil
Assigned To: murf
======================================================================
Project: Asterisk
Issue ID: 8126
Category: Core/CodecInterface
Reproducibility: always
Severity: minor
Priority: normal
Status: ready for testing
Asterisk Version: SVN
SVN Branch (only for SVN checkouts, not tarball releases): 1.2
SVN Revision (number only!): 44743
Disclaimer on File?: Yes
Request Review:
======================================================================
Date Submitted: 10-09-2006 20:21 CDT
Last Modified: 08-10-2007 04:34 CDT
======================================================================
Summary: [patch] G.711 codec woes
Description:
There is a *number* of problems in the a-law and u-law core transcoders
(most severe first):
1. a-Law decoder does not add the rounding error to the linear samples
output;
This results in a stable amplitude drop in the decoded signal overall, but
the negative phase portion of the signal is even more adversely affected:
the amplitude drop actually accumulates with consequtive transcodings (see
attached test patch). If the call encounters 127 tandem a-law transcodings
(a-alaw -> slin -> a-law -> slin -> ...), the entire negative portion will
be reduced to http://bugs.digium.com/view.php?id=#0.
2. Lookup table-driven slin->law coding rounds the negative values the
wrong way;
The breaks in linear value sequences do not happen where the table-driven
slin->law system expect them to. This results in certain negative linear
values to be encoded incorrectly (see attached test patch), which isn't
such a *big* problem, but a problem nonetheless.
There is no one-liner fix for this issue. To fix this, for example, we
could generate only half the slin->law table, for positive values only.
This table would contain half-cooked law bytes, so that the sign could be
added later to the values, along with the post-coding transform (NOT for
u-law and XOR 0x55 for a-law). In this case, AST_LIN2MU() would look
something like this:
inline unsigned char AST_LIN2MU(short sample)
{
unsigned sign = ((unsigned)sample & 0x8000) >> 8;
unsigned char law = __ast_lin2mu[(sample & 0x7fff) >> 2];
return ~(law | sign);
}
3. slin->a-law and slin->u-law functions handle value -32768 incorrectly;
This is not really a problem when using a lookup table system because the
slot of -32768 is overwritten later, but for the sake of correctness...
4. alaw.c:linear2alaw() is less than optimal;
5. slin->law lookup table generation code is less than optimal;
There is no reason to enumerate all the possible values between -32768 and
32767 when most of the results are overwritten later.
======================================================================
----------------------------------------------------------------------
fossil - 08-10-07 04:34
----------------------------------------------------------------------
I have repeated the benchmark experiment above on my system, with some
changes.
System: Celeron 2.53GHz, 256K cache, gcc 4.1.2
Changes to get more accuracy:
1) in translate.c
MAX_RECALC = 1000
ast_translator_dir.cost changed to *micro*seconds
2) ulaw_slin_ex.h
table changed to contain 100 ms worth of assortment of values in 0x00 -
0xff
3) slin_ulaw_ex.h
copied half the table from slin_speex_ex (250 ms of data); this provides a
better spread of memory accesses to the __ast_lin2X tables
4) in codecs.conf, [plc] section
genericplc => false (to prevent cost skewing due to extra data copying)
Tests ran with "show translation recalc 300".
Old algos:
law -> slin = 26 usec/sec
slin -> u-law = 21 usec/sec (slower than a-law: table is bigger)
slin -> a-law = 19 usec/sec
(I have to rethink my earlier "ulaw->slin cannot be slower than
slin->ulaw" comment. Clearly, the extra PLC branches in the law->slin
translators are having an effect here.)
New algos:
law -> slin = 26 usec/sec (unchanged, as expected)
slin -> u-law = 74 usec/sec (several runs, lowest value)
slin -> a-law = 78 usec/sec (why slower than u-law -- you got me)
(For comparison, with original all-zero sample data, slin->law is 54
usec/sec)
New algos with G711_REDUCED_BRANCHING:
slin -> law = 47 usec/sec (several runs, 46-47)
Obviously, the new algos result in quite a bit more processing per linear
sample (if not obvious, take my word for it, I've compared the asm code),
but it looks like the reduction in level 1 cache pressure makes itself
known on my system. Clearly, G711_REDUCED_BRANCHING provides a visible
improvement with my builds -- it's only double the original time. I've
checked, and for some reason gcc is not using the CMOV opcodes to eliminate
the branches in the codec_xlaw.so, possibly because of the register
pressure. You may get pretty different results with gcc 3.x.
This benchmark is still flawed, however, since we would not transcode this
much data in one go during normal processing. To get better results, we
need to flush the CPU cache after every 20 msecs of data or so to simulate
normal operating conditions. However, I fear this is going too far. ;-)
For reference, the whole slin -> XXX translation line (all in microsecs):
g723 gsm ulaw alaw g726 adpcm slin lpc10 g729 speex ilbc
slin - 2142 47 46 1340 248 - 4756 17277 - 15161
And just for the completeness sake, I eliminated the slin->law translation
tables and used the linear2Xlaw functions directly:
slin -> u-law = 142 usec/sec
slin -> a-law = 121 usec/sec
And with G711_REDUCED_BRANCHING:
slin -> u-law = 116 usec/sec
slin -> a-law = 108 usec/sec
Issue History
Date Modified Username Field Change
======================================================================
08-10-07 04:34 fossil Note Added: 0068701
======================================================================
More information about the asterisk-bugs
mailing list