[asterisk-dev] nVidia Cuda

Mon Mar 5 12:38:40 MST 2007

On Mon, 2007-03-05 at 11:26 -0700, asterisk-dev-request at lists.digium.com
wrote:
> Date: Mon, 5 Mar 2007 18:40:44 +0200
> From: Tzafrir Cohen <tzafrir.cohen at xorcom.com>
> Subject: Re: [asterisk-dev] nVidia Cuda
> To: asterisk-dev at lists.digium.com
> Message-ID: <20070305164044.GR21923 at xorcom.com>
> Content-Type: text/plain; charset=us-ascii
> 
> On Mon, Mar 05, 2007 at 11:07:44AM -0500, Matthew Rubenstein wrote:
> >       G.729 reportedly consumes about 16-25 MIPS, though the
> "I" (instruction
> > set) determines the rate:
> >
> http://lists.xiph.org/pipermail/speex-dev/2003-December/002131.html .
> > We're talking now about distributing stream MIPS between CPU and
> GPU. I
> > don't know whether G.729 can be factored that way, or what the
> > IPC/dataflow dependencies are. Depending on the performance of the
> > factored processes, the GPU might wait quite a bit, which might mean
> > that a CPU could best be used to feed network queues of RPC to the
> GPU,
> > with the logic-intensive factored processes running  on multiple
> CPUs on
> > the network. If only one CPU is feeding one GPU, and the GPU waits
> on
> > the CPU, then there aren't going to be any extra channels to feed
> stream
> > data to the GPU.
> > 
> >       It might be a better architecture overall to port these codecs
> to
> > OpenSER, which is the architecture for scalable channel loads,
> feeding
> > Asterisk G.711, or something even more raw like SLINEAR etc.
> 
> Start from something simpler:
> 
> 100 concurrent runs of sox transcoding speex , with no Asterisk
> threading to mind about and with no patent issue to muddy the water.

	Is sox/speex less complex, easier to port than G.729, while still being
a worthwhile product of the dev effort? Does OpenSER codec processing
have Asterisk threading dependencies? If it's just a start (not for
distribution/sale), patents do not inhibit research.


> My personal feeling is that the price of the GPU would be better spent
> on a stonger CPU. Not to mention the fact that the GPU architecture
> seem
> to scale very badly on a standard CPU system. But I'd be gladly proven
> wrong.

	Not only are GPUs cheap compared with CPUs per MFLOPS (GFLOPS), but
multiple GPUs can be run in a single host across a PCI-e bus. The
compute parallelism against parallel streams inside the low-latency,
high-bandwidth bus is scalable in ways not possible with the CPU. Adding
GPUs doesn't require adding the rest of the host HW, like power supply,
IO, chassis, etc that can be shared by the rest of the machine. What
makes you say that "the GPU architecture seem[s] to scale very badly on
[a] standard CPU system"?


> -- 
>                Tzafrir Cohen        
-- 

(C) Matthew Rubenstein