[asterisk-dev] [RFC] optimising RTP traffic?

Thu Jul 20 19:57:17 MST 2006

On Jul 20, 2006, at 10:05 PM, Roy Sigurd Karlsbakk wrote:

>> I'd like to offer my 2c on the issue:  Before you look really hard  
>> at what it takes to try to move RTP forwarding up from the  
>> application and into the kernel, it would make sense to:
>>
>> a) Profile the present situation:  You can use oprofile or other  
>> tools to profile the entire system, and see where most of the time  
>> is being spent.   If the goal is to optimize this process, this  
>> would point out the places to look.  I don't know if this has been  
>> done or not yet, but it seems the original idea was based on  
>> intuition, not on actual results.
>
> I've done so, which was the reason I started digging into this. I  
> tried setting up two boxes sending SIP/RTP calls between them  
> without reinvites, and with a single xeon ~3GHz, 500 calls kills  
> the system. About 70% of the time is spent in kernel, as I can see  
> from context switching between kernel- and userspace. This makes  
> Asterisk unusable to do such bridging in large traffic amounts. I'm  
> told, though, that this is not a problem since this is not what  
> asterisk's built to do. However, I find it hard to beleive software  
> shouldn't be allowed to get better.

Are you sure that the limitations you see come from context  
switching, sendto/recvfrom, though?  I think it might be more likely  
that it's simply select() or poll() scalability problems you're  
seeing.  Because I've had that many users in IAX2 calls, going into  
app_conference, on hardware not much bigger than that  (either Dual- 
P3-Ghz or so machines, or Dual-Xeon 3.2 range boxes)..   However, I  
have had problems with a similar program, that basically just pumps  
packets through, where it uses a single thread to handle all the  
traffic with poll();  the problem is that poll() is O(n^2).  Take a  
look at www.kegel.com/c10k.html. The solution is to use /dev/poll or  
epoll under linux, or kqueue under BSD.  (or, use libevent which  
provides a common API for each OS: http://www.monkey.org/~provos/ 
libevent/).

If you look into papers about this, the scalability thresholds you'll  
see for most papers is for HTTP requests and such, where you are  
pushing large packets.  It's much worse for VoIP, where you are  
sending 30-100pps per connection.

ast_waitfor_nandfds() (which is where all the ast_waitXXX calls end  
up) calls poll, and thus if there's lots of fds that you're looking  
for, will have the same scalability problems.

I'm not an expert on the RTP stack, though, so I'm not sure if it  
finds itself in the case where there is a thread handling many RTP  
sockets at once, but I can say for sure, that when I had a single  
thread handling many sockets doing VoIP packet switching in another  
application, I hit a scalability limit when using poll() in the  
hundereds of sockets, and cut down CPU load by a hundred fold by  
using libevent().

For reference, here's the CPU load on a system (this is a dual- 
Xeon-3.6), with 110 users, all in app_conference.  asterisk is  
obviously our friend.  perl is another program on the system, and  
"myswitcher" is the program I refer to (now using libevent;   
previously, it would be using a lot more CPU than asterisk,  
actually). asterisk in this case is doing a lot more than switching  
these calls, they're all speex, and they're all in app_conference.

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15077 foo   10 -10  125m  27m 2776 S  9.2  0.7 148:58.63 asterisk
14813 foo   16   0 20908  12m 1712 S  4.9  0.3 247:37.73 perl
18890 foo   11  -5 41292 6596  600 S  2.3  0.2   0:33.43 myswitcher

-SteveK

>
>> b) Build a system that you can at least use for profiling, which  
>> works using sendfile() or similar, even if it doesn't actually  
>> work properly, and benchmark it.  The point here is to just get a  
>> ballpark of the improvement you expect to see, to determine if  
>> it's worth actually making it all work.  You don't need to dot the  
>> i's and cross the t's, just make it something you can set up calls  
>> with, and benchmark.
>
> It really is not necessary to do this. Looking at the oprofile  
> output shows all the time is spent within readfrom/sendto and  
> related calls and their respective switches of contexts. Sendfile  
> is plain and doesn't involve such things. Doing the sendfile test  
> will be a complete waste of time, beleive me. sendfile() just takes  
> two file descriptors and connects them, without involving either  
> userspace nor kernel checksumming, so i'd guess something like 60%+  
> of the time taken to bridge RTP will be gone.
>
> roy
> --
> Roy Sigurd Karlsbakk
> roy at karlsbakk.net
> (+47) 98013356
> ---
> In space, loud sounds, like explosions, are even louder because  
> there is no air to get in the way.
>
>
> _______________________________________________
> --Bandwidth and Colocation provided by Easynews.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-dev
>