[asterisk-dev] GPU Audio Codec Transcoding within Asterisk PBX

Fri Jan 2 06:33:14 CST 2009

Joseph Benden wrote:
> On Jan 1, 2009, at 10:28 PM, Felipe Bergo wrote:
>
>   
>> The project started with transcoding g.711u to signed linear and the
>> reverse. It was thought that performing other transcoding operations
>> would be reasonably represented by this.
>>
>> Huh ?
>>
>> G711.u / G711.a <=> signed linear transcoding is a trivial table  
>> lookup with no arithmetic workload whatsoever. It does not represent  
>> the CPU-intensive transcoding operations such as G729 / iLBC /  
>> speex, which could benefit from GPU implementations, nor any video  
>> codec.
>>
>> A GPU implementation of G711<=>linear transcoding is probably  
>> wasting more instructions to transfer data from and to the GPU board  
>> than required to perform the transcoding in loco.
>>     
>
>
> The purpose was not to look at vectorizing g.711u, but rather to find  
> the architectural issues surrounding CUDA for advanced audio/video  
> transcoding. The amount of work required to properly support CUDA API  
> is quite large. By re-reading the quick outline of CUDA API  
> requirements in the original message, you'll see what I'm referring  
> to. If CUDA is a potential candidate for addition to Asterisk, these  
> architectural issues will have to be pursued.
>
> The research shows that audio transcoding gains the most current  
> benefit, by using SIMD CPU instructions. This is because SIMD does not  
> require the architectural changes; however, SIMD may require a  
> differing codec algorithm. SIMD requires operating on vectors of  
> floating-point or integer data, in a similar way that CUDA does - so,  
> rewriting codecs using SIMD offers an easier route to CUDA in the  
> future.
>
> Side note: I am working on x86 and x86_64 SSE2 SIMD support within  
> DAHDI for usage by conferencing and echo cancelers. The initial  
> results are very promising, but further testing needs to be done.
>   
Nothing you've said so far says that audio codecs are a problem on a 
CUDA machine, unless you use a extremely dumb way of managing them. 4096 
bytes chunks are no problem, if you do many concurrent channels of 
codec, and you wouldn't bother with CUDA unless you were doing many in 
parallel.

Felipe is right. If you want to see how well audio codecs will run, take 
a serious one for which floating point code is readily available - e.g. 
iLBC - and see what it can really be pushed to.

Regards,
Steve