[asterisk-dev] RTP trunking - 58% savings on media bandwidth?
John Todd
jtodd at digium.com
Thu Dec 18 03:13:59 CST 2008
[While I should be doing many, many other things this late evening
I'll spend some time and write about improvements that will only be
useful to a few people to the tune of tens of thousands of dollars. I
probably should be doing something else, but I have a hard time
letting go of a concept once it gets in my head, so putting it down in
a -dev posting for sad review years from now is really what I need to
do.]
This started on the thread about cRTP, but that quickly turned to a
dead end since cRTP seems to be link or interface specific and has
nothing to do really with RTP at endpoints. So I mulled it over a
bit, and came up with my idea for a multiplexed RTP (or what I'm now
calling "trunked RTP", or "TRTP", to be more consistent with IAX2
trunking concepts which this is based upon) . The numbers I
calculated far below are interesting, especially with lower-rate
codecs - it looks like one can get slightly less than twice the amount
of channels into the same bandwidth by getting rid of IP overhead by
multiplexing RTP sessions into a single UDP packet stream. The
details are not excruciating, and I think the use of existing concepts
and code makes this probably fairly do-able. Any reasonably-sized
carrier that does international traffic on high-price links might find
this useful - it provides some of the IAX2 trunking benefits without
having to shift over entirely to IAX2, and maybe it would be quickly
implementable by other SIP/RTP stacks that wanted to see decreases in
bandwidth usage between high-traffic nodes.
"New" overhead would have to be introduced into each trunked RTP
packet, I think. A two byte identifier would need to describe how
many sessions there were embedded in a single packet - it appears that
a single byte (255) would not cover all circumstances. This is what
I'm calling the "padding size". (For those of you who are reading the
RFC on RTP - the use of the last byte in the RTP packet (aka: the
padding byte) is insufficient to consider the number of possible
streams that could be contained in a single UDP packet, so the last 2
bytes would be used which is slightly different than the intended use
of the last byte in the packet if the padding bit is set, but oh
well.) The padding byte would include its own two bytes in the offset
count. While UDP is typically small, it is not _always_ small, so it
is possible to have a larger number of payloads than 8 bits could
represent, so a 16 bit number would be required to describe how far
from the end in the packet we should go in considering the internal
RTP header/payload counters. Each RTP header and payload would then
be describe in byte offsets from the start by a 2-byte value. So:
last byte is the number of RTP sub-packets, and then we have 2*(last
byte) number of bytes backwards in the packet that we need to keep
track of which each in turn point to the length of each sub-packet.
Whew!
I'm hoping that just embedding the whole RTP header for each stream
would be sufficiently descriptive so that after unpacking the receiver
would be able to determine what session was being described - isn't
that the purpose of the SSRC? Or would there need to be a new marker
on a per-payload basis? Does UDP port number factor into what session
is receiving the stream, or is SSRC the "canonical" data? I know that
cRTP compresses a lot of the RTP headers, but I didn't want to bite
off more than necessary here at once - it seems that just including
the full RTP headers is an easy (though inefficient) method to get the
code done by not having to create a table or cache of mappings or in/
out of band protocol notifiers to relay RTP settings per stream.
By embedding the whole RTP header and payload and making it
possible to pass them off entirely to the RTP processor, it is
possible to have a large mix of different media types in a single IP/
UDP datagram header. Codec choice is not relevant, nor is the size of
the payload/header fixed in the multiplexed packet so all sorts of
session combinations would work in a single TRTP transport stream. It
would be possible (though unlikely) to multiplex any type of media
that is RTP compliant - video, audio, or text that is being
transported between two hosts. Different sample rates would be
handled/buffered at each end, but hopefully administrators would try
to optimize the timing to get as many sub-packets into a single TRTP
packet as possible.
SDP will need to change a bit, but not much. An additional "m="
value will need to be added to the SDP to indicate the new possible
media flow, something like this (shown are both m= lines - the
"original" RTP/AVP line, and the new RTP/TRTP line which would both be
sent in the same SDP):
...
m=audio 18972 RTP/AVP 0 8 101
m=audio 19332 RTP/TRTP 0 8 101
...
This SDP modification is the only thing that I can think of that
I'm not entirely clear is the right way to do it. RFC 4566 says it's
OK to have multiple m= lines for the same destination, and explicitly
says "The semantics of multiple "m=" lines using the same transport
address are undefined." so this seems to be OK. The unregistered RTP/
TRTP format identifier indicates that the protocol is RTP running over
Trunked RTP. This could I suppose also be "UDP/TRTP" given that it is
unlikely that a new format identifier will find its way into the
"RTP/" tree, but that is a political issue outside the scope of this
discussion. It would also make RTP/AVP "assumed" for any RTP/TRTP
format streams, though I can think of an unnecessarily complex way to
use the "a=" attribute setting to confirm that a stream is RTP/AVP.
This shouldn't break any non-TRTP endpoints to which invites are sent,
and it will cue any TRTP-compatible platforms that they should use the
TRTP port for a new RTP sub-payload on any existing TRTP sessions that
have room, or should create a new TRTP session if one does not exist.
The port number in the RTP/TRTP line would be the new (or pre-
existing) UDP port that identifies the trunk channel that is available
for inbound communication.
http://www.rfc-editor.org/rfc/rfc4566.txt
RTCP will need to be tweaked a bit to reflect the correct stats,
but it seems to my untrained eye to not be an overwhelming task. The
jitter, latency, packet loss on any RTP payloads will simply be shared
for the duration of the flow. If a trunked packet is lost, then each
RTCP-tracked payload is decremented by what was lost. Packet counts
are taken from the actual packet counts of the TRTP stream, and the
only thing that will change are bytes, though that can be derived on-
the-fly pretty easily too by just counting the bytes for that
particular RTP payload flow. It may be necessary to create a single
RTCP "dummy" session that communicates between the two endpoints with
aggregate data in a way that is parse-able by intermediate systems
that sniff RTCP data, or maybe we just ditch RTCP entirely on TRTP
flows for now to avoid biting off more than can be chewed on this
project.
I can't help but think that this has already been done, but I don't
see where. I've found references to RTP multiplexing, but nobody
seems to use it, or they all point back to the CRTP RFC (2508) which
talks about tunneling through something like L2TP or PPP, which seems
overly burdensome. All of these drafts are expired and probably not
exactly useful: http://www.cs.columbia.edu/~hgs/rtp/mux.html
I believe that multiplexing of media streams (specifically low-
bandwidth audio) may have significant advantages in real-world
circumstances. The current hacks for RTP multiplexing seem complex,
or rarely used, or are at the wrong layer. Cost trumps protocol
purity, and cost is the factor which very often dictates features that
find their way into Asterisk. If a hypothetical network is paying
$10k per month for bandwidth, and can reduce their overall bandwidth
usage by 40% by implementing this type of functionality, then that's
$48,000 per year in savings, or more appropriately: $48,000 per year
in money not wasted. Even $5,000 would possibly cover the development
of this code, from a single company. I think many more than a single
company would be interested in this type of functionality. Of course,
there are many projects waiting to be completed with Asterisk, and
this one is not the most pressing. But I'll throw it out there to see
if anyone is game for creating interesting new features with Asterisk
that don't exist anywhere else - that's what the project seems to be
good at. Thanks for reading!
I enclose a comparison of standard RTP versus my TRTP concept.
Standard RTP @ 50pps
--------------------
Single RTP G.729 (ethernet) (*)
codec (G.729) = 20 bytes/packet = 8.0 kbps
RTP overhead = 12 bytes/packet = 4.8 kbps
UDP overhead = 8 bytes/packet = 3.2 kbps
IP overhead = 40 bytes/packet = 16.0 kbps
Ethernet L2 overhead = 18 bytes/packet = 7.2 kbps
Total = 39.2 kbps total
Multiply by 2 for two standard RTP stream = 78.4 kbps
Trunked RTP with 3 G.729 streams @ 50pps
----------------------------------------
Trunked RTP G.729 (ethernet)
chan1 RTP = 12 bytes/packet = 4.8 kbps
codec (G.729) chan1 = 20 bytes/packet = 8.0 kbps
chan2 RTP = 12 bytes/packet = 4.8 kbps
codec (G.729) chan2 = 20 bytes/packet = 8.0 kbps
chan3 RTP = 12 bytes/packet = 4.8 kbps
codec (G.729) chan3 = 20 bytes/packet = 8.0 kbps
TRTP padding overhead= (3*2)+2 B/packet = 3.2 kbps
UDP overhead = 8 bytes/packet = 3.2 kbps
IP overhead = 40 bytes/packet = 16.0 kbps
Ethernet L2 overhead = 18 bytes/packet = 7.2 kbps
Total for three G.729 streams = 68.0 kbps total
So for 10kbps less bandwidth than TWO channels of G.729 with regular
RTP, I can get THREE channels of TRTP. Let's see how this plays out
over more channels:
G.729 kbps comparison on number of channels:
# RTP TRTP %age of RTP
-- --- ---- -----------
1 39.2 40.8 104.1%
2 78.4 54.4 69.4%
3 117.6 68.0 57.8%
4 156.8 81.6 52.0%
5 196.0 95.2 48.6%
6 235.2 108.8 46.3%
7 274.4 122.4 44.6%
8 313.6 136.0 43.4%
9 352.8 149.6 42.4%
10 392.0 163.2 41.6%
With 10 channels, there is a 58.4% bandwidth savings. Not bad! At
100 channels in this same model, RTP is at 3.92 megabits per second,
TRTP is at 1.38 megabits per second so 35.4% of the RTP bandwidth -
almost 2/3rds savings.
(*) from http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml
(which may be off, actually - their math doesn't work, and it
appears they didn't add the codec to the final Bandwidth
Ethernet (Kbps) column - if 20 bytes per second equals 8kbps,
then something is wrong with their equation since creating a
kbps rate based on that ratio and byte size leads to 39.2 kbps
for a G.729 stream on Ethernet. It doesn't really matter -
as long as I've been consistent with my math the ratios
between "standard" and "TRTP" streams are relevant.)
Packet concept example:
+-----------------+------------------
+------------------+------------------------------+
| x bytes | x bytes | x
bytes |2bytes|2bytes|2bytes| 2 bytes |
+--------+--------+--------+---------+--------
+---------+------+------+------+---------+
| <- RTP1 size -> | <- RTP2 size -> | <- RTPx size
-> | <- Padding size -> |
+-------+--------+--------+--------+--------+---------+--------
+---------+------+------+------+---------+
|IP | UDP | RTP1 |RTP1 | RTP2 | RTP2 | RTPx |
RTPx | RTP1 | RTP2 | RTPx | Padding |
|Header | Header | Header |Payload | Header | Payload | Header |
Payload | Size | Size | Size | Size |
+-------+--------+--------+--------+--------+---------+--------
+---------+------+------+------+---------+
Possible config file items:
sip.conf:
trunking=on
; options:
; on = try to send and receive TRTP SDP extensions on all RTP
sessions
; receive = accept inbound TRTP from others if signaled, but do
not offer
; transmit = originate TRTP but do not accept inbound requests
; off = neither send nor receive TRTP
trunk-streammax=10
; Indicate how many streams should be multiplexed into a single
; TRTP packet. Above this number, and a new UDP TRTP stream will
; be started.
trunk-sizemax=1515
; Maximum size of a packet (IP, UDP, RTP headers plus RTP payload)
; that will be sent. Note this does not include Layer 2 packets.
; Many ethernet networks have a total size of 1536 including
layer 2,
; so 1515 is a safe number but MUCH bigger is possible depending
; on your network specifics.
trunk-maxwait=20
; milliseconds to wait to fill up a trunk packet before sending.
; If you have mixed frequency encodings (10ms, 20ms, 30ms) that are
; sharing a TRTP trunk, how long should we wait to fill up a queue
; before transmitting? Default is 20ms. Setting this lower may
; significantly increase bandwidth usage. [how does this work in
IAX2?]
JT
---
John Todd email:jtodd at digium.com
Digium, Inc. | Asterisk Open Source Community Director
445 Jan Davis Drive NW - Huntsville AL 35806 - USA
direct: +1-256-428-6083 http://www.digium.com/
More information about the asterisk-dev
mailing list