[Asterisk-Dev] IBM/SGI implementations

Mon Nov 22 16:47:13 MST 2004

At 1:56 PM -0500 on 11/22/04, Steve Kann wrote:
>>[snip]
>
>I think a great "trick" to do DSP operations 
>like codecs on cheap hardware is to use the 
>power of the GPU.  This is something that Mark's 
>been talking about in the past.
>
>Much of the work of audio DSP stuff seem to be 
>similar to the work that your off-the-shelf GPU 
>can do much faster than GP CPUs can, and the 
>price/performance is great.  You'd need to 
>rewrite the codecs, and other DSP routines for 
>this, but programming to a meta-library like Sh 
>(<http://libsh.sourceforge.net/>http://libsh.sourceforge.net/) 
>can allow your code to be portable amongst 
>several GPU backends.
>
>"The new GPUs are very fast. Stanford's Ian Buck calculates that the
>current GeForce FX 5900's performance peaks at 20 Gigaflops, the
>equivalent of a 10-GHz Pentium×with, according to Nvidia, even more
>speed on the horizon. Performance growth has multiplied at a rate of 2.8
>times per year since 1993, a pace analysts expect the industry to
>maintain for another five years. At this rate, GPU performance will move
>inexorably into the teraflop range by 2005. "
>
>While moving to bigger, more flexible iron is 
>always a strategy that can be made to work, it's 
>usually not the most cost-effective.
>
>If instead, we designed * such that you could 
>build a cluster of cheaper boxes, and have them 
>operate as a cohesive unit such that if one box 
>fails, others could seamlessly take over, you'd 
>still get much better price/performance than if 
>you used scalable/fault-tolerant (read: 
>expensive and complicated) hardware like this 
>thread has been discussing.
>
>The first step towards a true clustered solution 
>might be designed such that if a box fails, you 
>lose the calls that were presently on the box, 
>but otherwise the system can keep running.   As 
>long as you can have a cluster working together, 
>and scaling to tens or hundreds of machines 
>operating as a seamless unit, that might be an 
>acceptable solution, and able to give you a 
>decent number of 9's.

This would be a tremendous coup for Asterisk, and 
codec processing in general.  Does anyone on the 
list have the time and (more importantly) the 
ability to undertake such an endeavor?   It's a 
serious challenge, and not something that can be 
undertaken in a weekend of work (I'm not implying 
that anyone said it would be easy.)  Let me be 
the first to put $100 towards the bounty.  :-)

Building a generic interface to GPU processors 
would be a worthwhile CS PhD project, I think. 
The crypto that I so desperately want would also 
be much more possible with this type of numeric 
processing power, if it can be harnessed.

I can imagine the derision when I put my 2u box 
into a co-lo, with what are obviously three video 
adapters sticking out the back.  "Hey, dude, 
you're wasting your money with those video cards 
- there aren't any monitors in this cabinet!"

Speaking of off-board crypto processing, does 
anyone know if RTP (AES or otherwise) could 
easily be offloaded to the Broadcom or Hifn cards 
that are supported under some O/S'es?  (notably, 
OpenBSD)  Are the abilities of those cards useful 
towards VoIP applications?  S/MIME or TLS aren't 
a major issue as far as I'm concerned, because 
there typically is enough horsepower on the main 
CPU to handle a reasonable number of requests per 
second of just signalling.  Though processing SIP 
control messages in a very large scale 
environment is a problem even for the signalling, 
I think a more important issue is the audio 
stream.  Perhaps both can be sped up by one of 
these third-party boards.  Just a thought...

JT