[asterisk-dev] nVidia Cuda

Stelios Koroneos skoroneos at digital-opsis.com
Wed Mar 7 00:57:23 MST 2007


G729 (and most codecs) spend a great amound of time in certain functions.
In the g729 encoder a certain function takes close to 40% of the cycles
required, while 3 more take another 40%
Similar is the situation with the decoder.
You don't need to get the entire codec in the GPU, just the parts that make
a difference
I have been working with g729 porting/acceleration for close to 5 months now
(fixpoint mainly for cpu's without fp)and also got into the CUDA dev program
but did not find any time to do any actual work.
I was able to get G729 accelerrated in a softcore/fpga combination with very
good results

I would also suggest, as others allready mentioned, to start "low", by using
a simple encoder/decoder and try to see if you can speed it up,as it's a
bitch to try to debug codecs in asterisk.

Stelios S. Koroneos

> -----Original Message-----
> From: asterisk-dev-bounces at lists.digium.com
> [mailto:asterisk-dev-bounces at lists.digium.com]On Behalf Of Matthew
> Rubenstein
> Sent: Monday, March 05, 2007 6:08 PM
> To: Wai Wu
> Cc: Asterisk Developers Mailing List
> Subject: RE: [asterisk-dev] nVidia Cuda
>
>
> 	G.729 reportedly consumes about 16-25 MIPS, though the "I"
> (instruction
> set) determines the rate:
> http://lists.xiph.org/pipermail/speex-dev/2003-December/002131.html .
> We're talking now about distributing stream MIPS between CPU and GPU. I
> don't know whether G.729 can be factored that way, or what the
> IPC/dataflow dependencies are. Depending on the performance of the
> factored processes, the GPU might wait quite a bit, which might mean
> that a CPU could best be used to feed network queues of RPC to the GPU,
> with the logic-intensive factored processes running  on multiple CPUs on
> the network. If only one CPU is feeding one GPU, and the GPU waits on
> the CPU, then there aren't going to be any extra channels to feed stream
> data to the GPU.
>
> 	It might be a better architecture overall to port these codecs to
> OpenSER, which is the architecture for scalable channel loads, feeding
> Asterisk G.711, or something even more raw like SLINEAR etc.
>
>
> On Mon, 2007-03-05 at 10:27 -0500, Wai Wu wrote:
> > Someone once told me that g729 talkes 35mhz per channel. GPU wasting
> > cycles is not a concern since when the GPU is waiting for the cpu to
> > complete it tast, the GPU can take on other tasts or doing
> > transformation for other channels. This is exactly why I think the GPU
> > is perfect for Asterisk type of system. I are not trying to speed up
> > coding/decoding here, but rather, doing it for all channels at the same
> > time. As a matter of fact, if one has a 3ghz cpu, transcoding can be
> > done faster on the cpu than the g80 is the code is a straight port.
> >
> > -----Original Message-----
> > From: asterisk-dev-bounces at lists.digium.com
> > [mailto:asterisk-dev-bounces at lists.digium.com] On Behalf Of Matthew
> > Rubenstein
> > Sent: Monday, March 05, 2007 8:32 AM
> > To: CA DM
> > Cc: Asterisk-Dev
> > Subject: Re: [asterisk-dev] nVidia Cuda
> >
> > 	Do you know the relative MIPS consumed by G.729 in each of the
> > decoding and the transformation phases? If the decoding is a significant
> > percentage, then it will block the CPU process while the GPU waits a
> > long time. While a really fast GPU can afford to waste cycles in
> > inefficient logic if it's still faster than the CPU.
> >
> >
> > On Mon, 2007-03-05 at 09:13 +0100, CA DM wrote:
> > > You don't need to port the entire codec code to a GPU. GPU aren't good
> >
> > > general purpose processors, such as CPU aren't good stream processors.
> > >
> > > Code need to be cleaverly distributed across the specific abilities of
> >
> > > CPU and GPU (eg.: decoding and unpacking the data stream is a task
> > > suited for the CPU, like data block transformations are for the GPU)
> > > if you want to get the most from both.
> > >
> > > At 20.38 04/03/2007, you wrote:
> > > >         Compression algorithms have generally not been ported to
> > > >GPUs like the G80. They're usually more logic and branch oriented
> > > >than just brute force multiply-accumulates that GPUs specialize in. I
> >
> > > >also haven't seen any of the popular Asterisk codecs, like G.729 or
> > > >GSM, ported to any GPU. Is there a source for codecs ported to GPUs?
> > > >Or any research showing a good approach?
> > > >
> > > >
> > > >On Sun, 2007-03-04 at 10:12 -0700,
> > > >asterisk-dev-request at lists.digium.com
> > > >wrote:
> > > > > Date: Sat, 03 Mar 2007 22:05:19 -0500
> > > > > From: Wai Wu <wkwu at calltrol.com>
> > > > > Subject: [asterisk-dev] nVidia Cuda
> > > > > To: asterisk-dev at lists.digium.com
> > > > > Message-ID:
> > > > > <B0430B20D208514CB2AFF57E81645C3101B970 at k3-1.Calltrol.com>
> > > > > Content-Type: text/plain; charset="iso-8859-1"
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > Has any one looked into cuda to see if cpu intensive part of
> > > > > asterisk(like codecs and conference) can be moved to the G80
> > > > > processor? I found that the GF8800 cards are very inexpensive
> > > > > (around 400 USDs per). I have ported some of our financial
> > > > > applications to this board and found almost 10x performance
> > > > > improvement over the 3GHz C2D host processor. I would like to do
> > the same for asterisk.
> > > > >
> > > > > To start. I have been looking into the asterisk code and have no
> > > > > crew how it is structured except the addon applications. Here I
> > > > > have two questions.
> > > > >
> > > > > 1) Is each channel use their own thread if a codec is used?
> > > > > 2) Which part of the asterisk code actually makes the call to the
> > > > > necessary codec? I notice in the applications. They save and set
> > > > > the frame format, then read from the channel. So I trace the read
> > > > > function, but all it does is reading from a file descriptor. So I
> > > > > need some help here.
> > > >--
> > > >
> > > >(C) Matthew Rubenstein
> > > >
> > > >_______________________________________________
> > > >--Bandwidth and Colocation provided by Easynews.com --
> > > >
> > > >asterisk-dev mailing list
> > > >To UNSUBSCRIBE or update options visit:
> > > >    http://lists.digium.com/mailman/listinfo/asterisk-dev
> > >
> --
>
> (C) Matthew Rubenstein
>
> _______________________________________________
> --Bandwidth and Colocation provided by Easynews.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev
>




More information about the asterisk-dev mailing list