[asterisk-dev] [RFC] optimising RTP traffic?
SteveK
stevek at stevek.com
Thu Jul 20 19:57:17 MST 2006
On Jul 20, 2006, at 10:05 PM, Roy Sigurd Karlsbakk wrote:
>> I'd like to offer my 2c on the issue: Before you look really hard
>> at what it takes to try to move RTP forwarding up from the
>> application and into the kernel, it would make sense to:
>>
>> a) Profile the present situation: You can use oprofile or other
>> tools to profile the entire system, and see where most of the time
>> is being spent. If the goal is to optimize this process, this
>> would point out the places to look. I don't know if this has been
>> done or not yet, but it seems the original idea was based on
>> intuition, not on actual results.
>
> I've done so, which was the reason I started digging into this. I
> tried setting up two boxes sending SIP/RTP calls between them
> without reinvites, and with a single xeon ~3GHz, 500 calls kills
> the system. About 70% of the time is spent in kernel, as I can see
> from context switching between kernel- and userspace. This makes
> Asterisk unusable to do such bridging in large traffic amounts. I'm
> told, though, that this is not a problem since this is not what
> asterisk's built to do. However, I find it hard to beleive software
> shouldn't be allowed to get better.
Are you sure that the limitations you see come from context
switching, sendto/recvfrom, though? I think it might be more likely
that it's simply select() or poll() scalability problems you're
seeing. Because I've had that many users in IAX2 calls, going into
app_conference, on hardware not much bigger than that (either Dual-
P3-Ghz or so machines, or Dual-Xeon 3.2 range boxes).. However, I
have had problems with a similar program, that basically just pumps
packets through, where it uses a single thread to handle all the
traffic with poll(); the problem is that poll() is O(n^2). Take a
look at www.kegel.com/c10k.html. The solution is to use /dev/poll or
epoll under linux, or kqueue under BSD. (or, use libevent which
provides a common API for each OS: http://www.monkey.org/~provos/
libevent/).
If you look into papers about this, the scalability thresholds you'll
see for most papers is for HTTP requests and such, where you are
pushing large packets. It's much worse for VoIP, where you are
sending 30-100pps per connection.
ast_waitfor_nandfds() (which is where all the ast_waitXXX calls end
up) calls poll, and thus if there's lots of fds that you're looking
for, will have the same scalability problems.
I'm not an expert on the RTP stack, though, so I'm not sure if it
finds itself in the case where there is a thread handling many RTP
sockets at once, but I can say for sure, that when I had a single
thread handling many sockets doing VoIP packet switching in another
application, I hit a scalability limit when using poll() in the
hundereds of sockets, and cut down CPU load by a hundred fold by
using libevent().
For reference, here's the CPU load on a system (this is a dual-
Xeon-3.6), with 110 users, all in app_conference. asterisk is
obviously our friend. perl is another program on the system, and
"myswitcher" is the program I refer to (now using libevent;
previously, it would be using a lot more CPU than asterisk,
actually). asterisk in this case is doing a lot more than switching
these calls, they're all speex, and they're all in app_conference.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15077 foo 10 -10 125m 27m 2776 S 9.2 0.7 148:58.63 asterisk
14813 foo 16 0 20908 12m 1712 S 4.9 0.3 247:37.73 perl
18890 foo 11 -5 41292 6596 600 S 2.3 0.2 0:33.43 myswitcher
-SteveK
>
>> b) Build a system that you can at least use for profiling, which
>> works using sendfile() or similar, even if it doesn't actually
>> work properly, and benchmark it. The point here is to just get a
>> ballpark of the improvement you expect to see, to determine if
>> it's worth actually making it all work. You don't need to dot the
>> i's and cross the t's, just make it something you can set up calls
>> with, and benchmark.
>
> It really is not necessary to do this. Looking at the oprofile
> output shows all the time is spent within readfrom/sendto and
> related calls and their respective switches of contexts. Sendfile
> is plain and doesn't involve such things. Doing the sendfile test
> will be a complete waste of time, beleive me. sendfile() just takes
> two file descriptors and connects them, without involving either
> userspace nor kernel checksumming, so i'd guess something like 60%+
> of the time taken to bridge RTP will be gone.
>
> roy
> --
> Roy Sigurd Karlsbakk
> roy at karlsbakk.net
> (+47) 98013356
> ---
> In space, loud sounds, like explosions, are even louder because
> there is no air to get in the way.
>
>
> _______________________________________________
> --Bandwidth and Colocation provided by Easynews.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-dev
>
More information about the asterisk-dev
mailing list