[asterisk-users] Use the NEW ulaw/alaw codecs (slower, but cleaner)

Matthew Fredrickson creslin at digium.com
Mon Nov 17 17:52:41 CST 2008


Steve Underwood wrote:
> Matthew Fredrickson wrote:
>> Actually, with the way caching is done on nearly all modern processors, 
>> it is debatable whether or not a look up table is the optimal way to do 
>> the conversion, at least on such a simple codec such as ulaw or alaw. 
>> In fact, the amount of time it takes to fetch memory from a cache miss 
>> can easily ruin the single element lookup performance in a look up 
>> table.  And if you have large tables (such as in the linear to ulaw or 
>> alaw table), the tradeoff of having to service a cache miss versus a few 
>> cached instructions executing a native CPU clock speed makes it almost a 
>> no brainer (IMHO).
>>
>> You'll pay a cache miss on the first time your run the routine, but the 
>> instructions running the routine will take up much less CPU cache space 
>> than the look up tables, increasing the likelihood of them being evicted 
>> (whereas the lookup table, taking up a lot more space, has a much better 
>> chance of causing a cache miss whenever you access).
>>
>> Obviously, if you're running on a CPU with no cache, a look up table is 
>> a good way to do it.  I'm just saying that very few processors that are 
>> running Asterisk are running it on processors without processor caches.
>>
>> Matthew Fredrickson
>> Digium, Inc.
>>   
> In spandsp I do the G.711 conversions algorithmically. Most modern 
> processors have a "where is the top 1" instruction, and that reduces the 
> calculations to something very fast. When I first did this it was a lot 
> slower than a lookup if I tested it on its own, but faster in a real 
> workload where the cache was working hard. That was in the days of 256k 
> caches, though. Now the latest Intels have 12M the picture may be 
> different. That 12M is L3 cache, which is a lot slower than the small L1 
> cache, but I suspect it make mean the lookup approach is as good as 
> calculation with any workload.

That's a pretty good point too.  A lot of this is speculation until an 
actual workload is put through the mix.

I would suspect though that you're more likely to be faster on a larger 
range of processors in use at the moment (the bulk my guess wouldn't 
have 12 MB L3 caches) with the algorithmic approach, like you mentioned. 
  And if it's just a few instructions, it quite possibly could be faster 
than a combined L1 and L2 cache miss (IMHO :-) ).

Matthew Fredrickson
Digium, Inc.



More information about the asterisk-users mailing list