[asterisk-dev] OpenCL for improved performance transcoding and conferencing

Thu Sep 23 01:15:21 CDT 2010

  On 09/23/2010 07:03 AM, Chris Coleman wrote:
>>>   You probably already know very well about OpenCL -- Apple's open
>>>   standard OpenComputeLanguage that lets you use any supported
>>>   general-purpose GPU for much faster math computing....
>>
>>   Sometimes faster. Often slower. It depends how well the problem fits the
>>   hardware.
> Agreed.  I'm citing the wikipedia article on GPGPU.  It states one of the (approx 30) suitable problems for GPGPU to solve is Audio Signal Processing.
>
> One type it says it's been used for is Audio Speech Processing.
>
> http://en.wikipedia.org/wiki/GPGPU
You do realise that page was written by the marketing dept, don't you? 
Try to find real audio speech processing that worked out well.
>>>   Dual-core Atom D510 mini-ITX motherboards with the OpenCL-compatible GPU
>>>   Nvidia G210 (ION) are available for slight cost increase over boards
>>>   without the ION.  For example : the Jetway NC98
>>>
>>>   Theoretically the GPU is 10x more efficient in doing transcoding math,
>>>   than the CPU... so we could get up to 10x as many transcoded and
>>>   conferenced channels...
>>>
>>>   ....resulting in lower energy consumption, and higher number of
>>>   transcoded + conferenced channels, per PBX server, before hitting the
>>>   limit of CPU math processing horsepower.
>>
>>   CUDA is far more mature than OpenCL right now. Have a lot around the
>>   internet at attempts to use CUDA to accelerate audio and speech codecs.
>>   The results aren't good. I have only seem one attempt - a poor
>>   performing MP3 encoder - where the developers didn't get disheartened
>>   and abandon work before completion.
> I don't doubt CUDA is ahead of OpenCL: OpenCL wraps CUDA (Nvidia's GP-GPU libraries that works ONLY for their GPU's) AND ATI Stream (ATI's GP-GPU libraries that work ONLY on ATI's GPU's).
>
> Nvidia has BILLIONS of dollars of incentives to lock as many users into CUDA as possible, and not be compatible with ATI. So they are trying to innovate and not interoperate.
>
> Us in Linux world however, are trying to interop to the MAX.  This is why I suggested OpenCL.
>
> OpenCL should currently have all needed functionality to accelerate the relatively simple (compared to 3D grapics) Audio and Speech Processing.
All completely irrelevant to what I said. CUDA has the upper hand today, 
and it *still* has no proven record of performing well.
>>>   Does anyone else think it'd be brilliant of the * developers to update
>>>   the transcoding and conferencing code for 2010, to use OpenCL (which
>>>   uses any available, compatible GPU) for absolutely awesome performance ??
>>>
>>>
>>
>>   Awesome? Really? You have supporting evidence?
> The "evidence" I have is from general knowledge and experience.
Can you share some of this general knowledge and experience? The general 
knowledge the rest of us have is good results for a few algorithms, and 
dismal failure for most others.
> High ratio audio compression (such as the G.729, as well as mp3 and mp4-AAC) uses statistics to intelligently drop some of the audio information.
Where did you get that drivel from? Compression is based on what the ear 
can't detect, and speech compression is also based on what the ear can't 
produce. There is no statistical processing in any of the practical 
compression algorithms.
> And to run those stats, the algorithm has to do floating point vector math : something called a FFT, Fast Fourier Transfer function, on the incoming raw audio stream.
Things like MP3, AAC and G.722.1, use IDCT, which is not a million miles 
from an FFT, but audio compression never uses an FFT. The key speech 
compression codecs today, like G.729 don't do anything remotely like an 
FFT. DCTs are also the basis of most video codecs, and the GPUs are 
designed to run those well. This might lead one to assume that at least 
MP3 will run very well on a GPU. The only attempt I know of produced 
very poor results, but that was with an older GPU. I suspect the Fermi 
will do a lot better.

> With 40, 56, 128, and 192 stream processing cores, a GP-GPU is going to be able to get that done a LOT quicker, on a LOT more simultaneous channels, than the simple onboard floating point unit built into the Atom processor, which can handle only a couple of math calculations per clock cycle per core.
They do really well on long vectors, which a lot of high performance 
computing is heavily based on. They do really badly with masses of short 
vectors, and a lot of decision making, like the typical speech codec is 
filled with.
>>   The nVidia Fermi may change things quite a bit, as it is a more general
>>   purpose compute engine than the earlier GPUs. I still haven't seen
>>   anything impressive for any computation in the ballpark of a speech
>>   codec, though. I bought a GTX460 card a couple of week ago to do some
>>   experiments, but I'm struggling to find the time to work on it.
> While youre GTX460 is a totally butt kicking card from today, General Purpose GPU computing is available going back a few years now, I think it started truly with the DirectX 10.1 cards.
The raw speed of the GTX460 is largely irrelevant here. Its the improved 
architecture of the Fermi design, which makes it a lot more general 
purpose which has made me take a renewed interest.
>>   The Toshiba SpursEngine card from Leadtek is certainly capable of
>>   accelerating speech codecs. The now defunct HowlerTech company was using
>>   them. Not too expensive. Fairly low power. The snag is they seem to be a
>>   dead end. Leadtek supply a 32bit Linux SDK, but say a 64 bit SDK will
>>   not appear. To me, that says abandonware.
>>   Steve
>
> The reason I suggested GP-GPU and not an add-on card, is because it's available in all the mini-ITX motherboards with a cheap DirectX 10.1 or higher graphics chip : Nvidia ION, and soon to come: ATI Fusion.
>
> No extra PCI slot required (many mini-ITX boards dont even have one), same power consumption.
>
> Basically, everyone who buys that simple motherboard or better, and it's available now, they can use it.
>
> It's the simplest, cheapest and, best bang for the buck.
>
> Plus, servers rarely need fancy graphics, so it's not like you're taking the GPU power from something else that needs it.
>
> Also, add-on graphics cards are available in PCIe 1x format ... this GT210 for example... for about 30.00
Free hardware is great if it performs. Do you have any evidence the ion 
ever performs well on anything but the video codecs it has been highly 
tailored for?

Steve