[asterisk-dev] RTP trunking - 58% savings on media bandwidth?

Thu Dec 18 03:13:59 CST 2008

[While I should be doing many, many other things this late evening  
I'll spend some time and write about improvements that will only be  
useful to a few people to the tune of tens of thousands of dollars.  I  
probably should be doing something else, but I have a hard time  
letting go of a concept once it gets in my head, so putting it down in  
a -dev posting for sad review years from now is really what I need to  
do.]

   This started on the thread about cRTP, but that quickly turned to a  
dead end since cRTP seems to be link or interface specific and has  
nothing to do really with RTP at endpoints.  So I mulled it over a  
bit, and came up with my idea for a multiplexed RTP (or what I'm now  
calling "trunked RTP", or "TRTP", to be more consistent with IAX2  
trunking concepts which this is based upon) .  The numbers I  
calculated far below are interesting, especially with lower-rate  
codecs - it looks like one can get slightly less than twice the amount  
of channels into the same bandwidth by getting rid of IP overhead by  
multiplexing RTP sessions into a single UDP packet stream.  The  
details are not excruciating, and I think the use of existing concepts  
and code makes this probably fairly do-able.  Any reasonably-sized  
carrier that does international traffic on high-price links might find  
this useful - it provides some of the IAX2 trunking benefits without  
having to shift over entirely to IAX2, and maybe it would be quickly  
implementable by other SIP/RTP stacks that wanted to see decreases in  
bandwidth usage between high-traffic nodes.

   "New" overhead would have to be introduced into each trunked RTP  
packet, I think.  A two byte identifier would need to describe how  
many sessions there were embedded in a single packet - it appears that  
a single byte (255) would not cover all circumstances.  This is what  
I'm calling the "padding size".  (For those of you who are reading the  
RFC on RTP - the use of the last byte in the RTP packet (aka: the  
padding byte) is insufficient to consider the number of possible  
streams that could be contained in a single UDP packet, so the last 2  
bytes would be used which is slightly different than the intended use  
of the last byte in the packet if the padding bit is set, but oh  
well.)  The padding byte would include its own two bytes in the offset  
count.  While UDP is typically small, it is not _always_ small, so it  
is possible to have a larger number of payloads than 8 bits could  
represent, so a 16 bit number would be required to describe how far  
from the end in the packet we should go in considering the internal  
RTP header/payload counters.  Each RTP header and payload would then  
be describe in byte offsets from the start by a 2-byte value.  So:  
last byte is the number of RTP sub-packets, and then we have 2*(last  
byte) number of bytes backwards in the packet that we need to keep  
track of which each in turn point to the length of each sub-packet.   
Whew!

   I'm hoping that just embedding the whole RTP header for each stream  
would be sufficiently descriptive so that after unpacking the receiver  
would be able to determine what session was being described - isn't  
that the purpose of the SSRC?  Or would there need to be a new marker  
on a per-payload basis?  Does UDP port number factor into what session  
is receiving the stream, or is SSRC the "canonical" data?  I know that  
cRTP compresses a lot of the RTP headers, but I didn't want to bite  
off more than necessary here at once - it seems that just including  
the full RTP headers is an easy (though inefficient) method to get the  
code done by not having to create a table or cache of mappings or in/ 
out of band protocol notifiers to relay RTP settings per stream.

   By embedding the whole RTP header and payload and making it  
possible to pass them off entirely to the RTP processor, it is  
possible to have a large mix of different media types in a single IP/ 
UDP datagram header.  Codec choice is not relevant, nor is the size of  
the payload/header fixed in the multiplexed packet so all sorts of  
session combinations would work in a single TRTP transport stream.  It  
would be possible (though unlikely) to multiplex any type of media  
that is RTP compliant - video, audio, or text that is being  
transported between two hosts.  Different sample rates would be  
handled/buffered at each end, but hopefully administrators would try  
to optimize the timing to get as many sub-packets into a single TRTP  
packet as possible.

   SDP will need to change a bit, but not much.  An additional "m="  
value will need to be added to the SDP to indicate the new possible  
media flow, something like this (shown are both m= lines - the  
"original" RTP/AVP line, and the new RTP/TRTP line which would both be  
sent in the same SDP):

  ...
  m=audio 18972 RTP/AVP 0 8 101
  m=audio 19332 RTP/TRTP 0 8 101
  ...

   This SDP modification is the only thing that I can think of that  
I'm not entirely clear is the right way to do it.  RFC 4566 says it's  
OK to have multiple m= lines for the same destination, and explicitly  
says "The semantics of multiple "m=" lines using the same transport  
address are undefined." so this seems to be OK.  The unregistered RTP/ 
TRTP format identifier indicates that the protocol is RTP running over  
Trunked RTP.  This could I suppose also be "UDP/TRTP" given that it is  
unlikely that a new format identifier will find its way into the  
"RTP/" tree, but that is a political issue outside the scope of this  
discussion.  It would also make RTP/AVP "assumed" for any RTP/TRTP  
format streams, though I can think of an unnecessarily complex way to  
use the "a=" attribute setting to confirm that a stream is RTP/AVP.   
This shouldn't break any non-TRTP endpoints to which invites are sent,  
and it will cue any TRTP-compatible platforms that they should use the  
TRTP port for a new RTP sub-payload on any existing TRTP sessions that  
have room, or should create a new TRTP session if one does not exist.
   The port number in the RTP/TRTP line would be the new (or pre- 
existing) UDP port that identifies the trunk channel that is available  
for inbound communication.

   http://www.rfc-editor.org/rfc/rfc4566.txt

   RTCP will need to be tweaked a bit to reflect the correct stats,  
but it seems to my untrained eye to not be an overwhelming task.  The  
jitter, latency, packet loss on any RTP payloads will simply be shared  
for the duration of the flow.   If a trunked packet is lost, then each  
RTCP-tracked payload is decremented by what was lost.  Packet counts  
are taken from the actual packet counts of the TRTP stream, and the  
only thing that will change are bytes, though that can be derived on- 
the-fly pretty easily too by just counting the bytes for that  
particular RTP payload flow.  It may be necessary to create a single  
RTCP "dummy" session that communicates between the two endpoints with  
aggregate data in a way that is parse-able by intermediate systems  
that sniff RTCP data, or maybe we just ditch RTCP entirely on TRTP  
flows for now to avoid biting off more than can be chewed on this  
project.

   I can't help but think that this has already been done, but I don't  
see where.  I've found references to RTP multiplexing, but nobody  
seems to use it, or they all point back to the CRTP RFC (2508) which  
talks about tunneling through something like L2TP or PPP, which seems  
overly burdensome.  All of these drafts are expired and probably not  
exactly useful:  http://www.cs.columbia.edu/~hgs/rtp/mux.html

   I believe that multiplexing of media streams (specifically low- 
bandwidth audio) may have significant advantages in real-world  
circumstances.  The current hacks for RTP multiplexing seem complex,  
or rarely used, or are at the wrong layer.  Cost trumps protocol  
purity, and cost is the factor which very often dictates features that  
find their way into Asterisk.  If a hypothetical network is paying  
$10k per month for bandwidth, and can reduce their overall bandwidth  
usage by 40% by implementing this type of functionality, then that's  
$48,000 per year in savings, or more appropriately: $48,000 per year  
in money not wasted.  Even $5,000 would possibly cover the development  
of this code, from a single company.  I think many more than a single  
company would be interested in this type of functionality.  Of course,  
there are many projects waiting to be completed with Asterisk, and  
this one is not the most pressing.  But I'll throw it out there to see  
if anyone is game for creating interesting new features with Asterisk  
that don't exist anywhere else - that's what the project seems to be  
good at.  Thanks for reading!

I enclose a comparison of standard RTP versus my TRTP concept.

Standard RTP @ 50pps
--------------------
Single RTP G.729 (ethernet) (*)
   codec (G.729)        = 20 bytes/packet   =  8.0 kbps
   RTP overhead         = 12 bytes/packet   =  4.8 kbps
   UDP overhead         =  8 bytes/packet   =  3.2 kbps
   IP overhead          = 40 bytes/packet   = 16.0 kbps
   Ethernet L2 overhead = 18 bytes/packet   =  7.2 kbps
   Total                                    = 39.2 kbps total

Multiply by 2 for two standard RTP stream   = 78.4 kbps

Trunked RTP with 3 G.729 streams @ 50pps
----------------------------------------
Trunked RTP G.729 (ethernet)
   chan1 RTP            = 12 bytes/packet   =  4.8 kbps
   codec (G.729) chan1  = 20 bytes/packet   =  8.0 kbps
   chan2 RTP            = 12 bytes/packet   =  4.8 kbps
   codec (G.729) chan2  = 20 bytes/packet   =  8.0 kbps
   chan3 RTP            = 12 bytes/packet   =  4.8 kbps
   codec (G.729) chan3  = 20 bytes/packet   =  8.0 kbps
   TRTP padding overhead= (3*2)+2 B/packet  =  3.2 kbps
   UDP overhead         =  8 bytes/packet   =  3.2 kbps
   IP overhead          = 40 bytes/packet   = 16.0 kbps
   Ethernet L2 overhead = 18 bytes/packet   =  7.2 kbps

Total for three G.729 streams              = 68.0 kbps total

So for 10kbps less bandwidth than TWO channels of G.729 with regular  
RTP, I can get THREE channels of TRTP.  Let's see how this plays out  
over more channels:

G.729 kbps comparison on number of channels:
#        RTP     TRTP    %age of RTP
--       ---     ----    -----------
1        39.2    40.8    104.1%
2        78.4    54.4     69.4%
3       117.6    68.0     57.8%
4       156.8    81.6     52.0%
5       196.0    95.2     48.6%
6       235.2   108.8     46.3%
7       274.4   122.4     44.6%
8       313.6   136.0     43.4%
9       352.8   149.6     42.4%
10      392.0   163.2     41.6%

With 10 channels, there is a 58.4% bandwidth savings.  Not bad!  At  
100 channels in this same model, RTP is at 3.92 megabits per second,  
TRTP is at 1.38 megabits per second so 35.4% of the RTP bandwidth -  
almost 2/3rds savings.

(*) from http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml
    (which may be off, actually - their math doesn't work, and it
     appears they didn't add the codec to the final Bandwidth
     Ethernet (Kbps) column - if 20 bytes per second equals 8kbps,
     then something is wrong with their equation since creating a
     kbps rate based on that ratio and byte size leads to 39.2 kbps
     for a G.729 stream on Ethernet.  It doesn't really matter -
     as long as I've been consistent with my math the ratios
     between "standard" and "TRTP" streams are relevant.)

Packet concept example:

                  +-----------------+------------------ 
+------------------+------------------------------+
                  |      x bytes    |      x bytes     |     x  
bytes      |2bytes|2bytes|2bytes| 2 bytes |
                  +--------+--------+--------+---------+-------- 
+---------+------+------+------+---------+
                  | <- RTP1 size -> | <- RTP2 size  -> | <- RTPx size   
-> | <-       Padding size     -> |
+-------+--------+--------+--------+--------+---------+-------- 
+---------+------+------+------+---------+
|IP     | UDP    | RTP1   |RTP1    | RTP2   | RTP2    | RTPx   |  
RTPx    | RTP1 | RTP2 | RTPx | Padding |
|Header | Header | Header |Payload | Header | Payload | Header |  
Payload | Size | Size | Size | Size    |
+-------+--------+--------+--------+--------+---------+-------- 
+---------+------+------+------+---------+

Possible config file items:

sip.conf:
  trunking=on
    ; options:
    ;  on = try to send and receive TRTP SDP extensions on all RTP  
sessions
    ;  receive = accept inbound TRTP from others if signaled, but do  
not offer
    ;  transmit = originate TRTP but do not accept inbound requests
    ;  off = neither send nor receive TRTP
  trunk-streammax=10
    ; Indicate how many streams should be multiplexed into a single
    ;  TRTP packet.  Above this number, and a new UDP TRTP stream will
    ;  be started.
  trunk-sizemax=1515
    ; Maximum size of a packet (IP, UDP, RTP headers plus RTP payload)
    ;  that will be sent.  Note this does not include Layer 2 packets.
    ;  Many ethernet networks have a total size of 1536 including  
layer 2,
    ;  so 1515 is a safe number but MUCH bigger is possible depending
    ;  on your network specifics.
  trunk-maxwait=20
    ; milliseconds to wait to fill up a trunk packet before sending.
    ; If you have mixed frequency encodings (10ms, 20ms, 30ms) that are
    ; sharing a TRTP trunk, how long should we wait to fill up a queue
    ; before transmitting?  Default is 20ms.  Setting this lower may
    ; significantly increase bandwidth usage. [how does this work in  
IAX2?]

JT

---
John Todd                       email:jtodd at digium.com
Digium, Inc. | Asterisk Open Source Community Director
445 Jan Davis Drive NW -  Huntsville AL 35806  -   USA
direct: +1-256-428-6083         http://www.digium.com/