[asterisk-dev] nVidia Cuda

Matthew Rubenstein email at mattruby.com
Mon Mar 5 09:07:44 MST 2007


	G.729 reportedly consumes about 16-25 MIPS, though the "I" (instruction
set) determines the rate:
http://lists.xiph.org/pipermail/speex-dev/2003-December/002131.html .
We're talking now about distributing stream MIPS between CPU and GPU. I
don't know whether G.729 can be factored that way, or what the
IPC/dataflow dependencies are. Depending on the performance of the
factored processes, the GPU might wait quite a bit, which might mean
that a CPU could best be used to feed network queues of RPC to the GPU,
with the logic-intensive factored processes running  on multiple CPUs on
the network. If only one CPU is feeding one GPU, and the GPU waits on
the CPU, then there aren't going to be any extra channels to feed stream
data to the GPU.

	It might be a better architecture overall to port these codecs to
OpenSER, which is the architecture for scalable channel loads, feeding
Asterisk G.711, or something even more raw like SLINEAR etc.


On Mon, 2007-03-05 at 10:27 -0500, Wai Wu wrote:
> Someone once told me that g729 talkes 35mhz per channel. GPU wasting
> cycles is not a concern since when the GPU is waiting for the cpu to
> complete it tast, the GPU can take on other tasts or doing
> transformation for other channels. This is exactly why I think the GPU
> is perfect for Asterisk type of system. I are not trying to speed up
> coding/decoding here, but rather, doing it for all channels at the same
> time. As a matter of fact, if one has a 3ghz cpu, transcoding can be
> done faster on the cpu than the g80 is the code is a straight port.
> 
> -----Original Message-----
> From: asterisk-dev-bounces at lists.digium.com
> [mailto:asterisk-dev-bounces at lists.digium.com] On Behalf Of Matthew
> Rubenstein
> Sent: Monday, March 05, 2007 8:32 AM
> To: CA DM
> Cc: Asterisk-Dev
> Subject: Re: [asterisk-dev] nVidia Cuda
> 
> 	Do you know the relative MIPS consumed by G.729 in each of the
> decoding and the transformation phases? If the decoding is a significant
> percentage, then it will block the CPU process while the GPU waits a
> long time. While a really fast GPU can afford to waste cycles in
> inefficient logic if it's still faster than the CPU.
> 
> 
> On Mon, 2007-03-05 at 09:13 +0100, CA DM wrote:
> > You don't need to port the entire codec code to a GPU. GPU aren't good
> 
> > general purpose processors, such as CPU aren't good stream processors.
> > 
> > Code need to be cleaverly distributed across the specific abilities of
> 
> > CPU and GPU (eg.: decoding and unpacking the data stream is a task 
> > suited for the CPU, like data block transformations are for the GPU) 
> > if you want to get the most from both.
> > 
> > At 20.38 04/03/2007, you wrote:
> > >         Compression algorithms have generally not been ported to  
> > >GPUs like the G80. They're usually more logic and branch oriented 
> > >than just brute force multiply-accumulates that GPUs specialize in. I
> 
> > >also haven't seen any of the popular Asterisk codecs, like G.729 or 
> > >GSM, ported to any GPU. Is there a source for codecs ported to GPUs? 
> > >Or any research showing a good approach?
> > >
> > >
> > >On Sun, 2007-03-04 at 10:12 -0700, 
> > >asterisk-dev-request at lists.digium.com
> > >wrote:
> > > > Date: Sat, 03 Mar 2007 22:05:19 -0500
> > > > From: Wai Wu <wkwu at calltrol.com>
> > > > Subject: [asterisk-dev] nVidia Cuda
> > > > To: asterisk-dev at lists.digium.com
> > > > Message-ID: 
> > > > <B0430B20D208514CB2AFF57E81645C3101B970 at k3-1.Calltrol.com>
> > > > Content-Type: text/plain; charset="iso-8859-1"
> > > >
> > > > Hi devs,
> > > >
> > > > Has any one looked into cuda to see if cpu intensive part of 
> > > > asterisk(like codecs and conference) can be moved to the G80 
> > > > processor? I found that the GF8800 cards are very inexpensive 
> > > > (around 400 USDs per). I have ported some of our financial 
> > > > applications to this board and found almost 10x performance 
> > > > improvement over the 3GHz C2D host processor. I would like to do
> the same for asterisk.
> > > >
> > > > To start. I have been looking into the asterisk code and have no 
> > > > crew how it is structured except the addon applications. Here I 
> > > > have two questions.
> > > >
> > > > 1) Is each channel use their own thread if a codec is used?
> > > > 2) Which part of the asterisk code actually makes the call to the 
> > > > necessary codec? I notice in the applications. They save and set 
> > > > the frame format, then read from the channel. So I trace the read 
> > > > function, but all it does is reading from a file descriptor. So I 
> > > > need some help here.
> > >--
> > >
> > >(C) Matthew Rubenstein
> > >
> > >_______________________________________________
> > >--Bandwidth and Colocation provided by Easynews.com --
> > >
> > >asterisk-dev mailing list
> > >To UNSUBSCRIBE or update options visit:
> > >    http://lists.digium.com/mailman/listinfo/asterisk-dev
> > 
-- 

(C) Matthew Rubenstein



More information about the asterisk-dev mailing list