[Asterisk-Dev] IAX I-D comments
Steve Kann
stevek at stevek.com
Wed Apr 27 15:03:05 MST 2005
Firstly, _THANKS SO MUCH_ for starting this! I think it's great, and I
think that keeping this up-to-date (and requiring that we update this
specification _first_ before changing or adding things in the code) will
be a great help to implementors, and can help improve the process greatly.
- My take on the IAX2 vs IAX business: I think the protocol should be
called IAX2 or IAX version 2.0 or something. Just because IAX(1) has
been deprecated for some time doesn't mean it's not still out there;
There are linux distributions still shipping libiax, for example.
- IAX can use a well-known port, but it is not a requirement; several
introductory paragraphs (which, I imagine, are non-normative), seem to
imply that it is.
- "The bandwidth efficiency for other stream types is sacrificed for the
sake of individual voice calls." I'm not sure I would agree with this;
it's uses equally low overhead for video streams, images, URLs, etc.
- "Meta frames are used for call trunking or video stream
transmission." (XXX check.)
- Security: unencrypted IAX is subject to a variety of DoS attacks, at
the very least. (It should be trivial to send INVAL and kill sessions,
for example).
- "Full frames are sent reliably, so all full frames require an
immediate acknowledgment upon receipt." Actually, it ought to work
similar to TCP, where an acknowledgement may be send up to 1RTT after
receipt (delayed ACK), to enable more ACKs to be sent piggybacked
(implicitly). The delayed ack functionality should be described as a
SHOULD in the specification, since it is OK (but not optimal) for a peer
to send immediate, explicit ACK frames whenever it receives a full frame
(libiax2 does this presently).
- "This 15-bit value specifies the call number the transmitting
client uses to identify this call. The source call number for an
active call MUST not be in use by another call on the same
client." -- should probably say "from the same peer" instead of "on the
same client" -- generally, we should avoid using "client" and "server",
I think.
- "IAX does not specify a retransmit timeout; this is
left to the implementor." -- we should probably specify with a
SHOULD how these timers should work.
- Timestamp
The Timestamp field contains a 32-bit timestamp maintained by an
IAX peer for a given call. The timestamp is an incrementally
increasing representation of the number of milliseconds since the
first transmission of the call.
Based on my experience, we really ought to write a bunch more about
timestamps, and how they work, as they are one of the trickiest areas to
get right in an implementation.. I can definitely help with this.
- Mini Frames are so named because their header is a minimal 4 octets.
Mini frames carry no control or signaling data; their sole purpose is
to carry a media stream on an already-established IAX call. They are
sent unreliably. This decision was made because VOIP calls typically
can miss several frames without significant degradation in call
quality while the incurred overhead in ensuring reliability increases
bandwidth requirements and decreases throughput. Further, because
voice calls are typically sent in real time, lost frames are too old
to be reintegrated into the audio stream by the time they can be
retransmitted.
Actually, mini frames can carry only audio stream data, not the
"media stream"; it would help to clarify this a bit. I think I might
skip the whole discussion of why they are sent unreliably -- but that's
just my opinion (well, this whole reply is mostly just my opinion..).
- Timestamp
Mini frames carry a 16-bit timestamp, which is the lower 16 bits
of the transmitting peer's full 32-bit timestamp for the call.
The timestamp allows synchronization of incoming frames so that
they may be processed in chronological order instead of the
(possibly different) order in which they are received. The 16-bit
timestamp wraps after 65.536 seconds, at which point a full frame
SHOULD be sent to notify the remote peer that its timestamp has
been reset. A call must continue to send mini frames starting
with timestamp 0 even if acknowledgment of the resynchronization
is not received.
There's some sublety here that comes into play when DTX (discontinuous
transmission) happens. Example:
o You're going along, sending mini frames from the beginning of the
call, for 30 seconds, and then you stop sending audio.
o 5 minutes pass, while you're not sending audio (perhaps you're just
listening silently to a conference call, waiting on hold, etc.).
o You then begin sending audio again; possibly sending a FULL voice
frame, and then miniframes.
In this case even if you send the full frame first, the receiver might
not receive it before it receives the next miniframe(s). In that case,
if the only means of updating the top 16 bits of the receiver's idea of
your timestamps is FULL voice frames, it's going to totally blow up
reconstructing the timestamps on your miniframes.
For this reason, what I've done in iaxclient/libiax2, and what I have a
patch in mantis to do, is to update the top 16 bits on all full frames,
and ensure that PING frames are sent every 10-30 seconds, in order to
ensure that when coming out of a silent state like this, the timestamps
on miniframes can be appropriately reconstructed.
- Mini frames are implicitly defined to be of type 'voice frame'
(frametype 2; see Section 6). The subclass is implicitly defined by
the most recent full voice frame of a call (i.e. the subclass for a
voice frame specifies the codec used with the stream). The first
voice frame of a call should be sent using the codec agreed upon in
the initial codec negotiation. On-the-fly codec negotiation is
permitted by sending a full voice frame specifying the new codec to
use in the subclass field.
I think some note is in order to describe the condition which can occur
when a codec changes, but mini frames (with the new codec) arrive before
the full frame which specifies the new codec. In practice, I think that
the decoders will either (a) ignore the invalid data, or (b) produce a
short burst of jibberish when they receive it.
- Command Data
This 8-bit field specifies flags for options which apply to a
trunked call. The least significant bit of the field is the
'trunk timestamps' flag. A value of 0 indicates that the calls in
the trunk do not include their individual timestamps. A value of
1 indicates that the calls do each include their own timestamp.
All other bits are reserved for future use.
Because the only presently defined states for this field are 0x00 and
0x01, we could define the field to be either a bitmap (as you have), or
an 8 bit integer. I'm not sure it matters until other things use this
field, though.
- Timestamp
Meta trunk frames carry a 32-bit timestamp, which represents the
actual time of transmission of the trunk frame. This is distinct
from the timestamps of the calls included in the trunk.
I think "actual time of transmission" should be replaced with "number of
milliseconds since the beginning of the trunk session" or something like
that. "actual time" seems to imply some relation to the time of day.
- IAX allows multiple media exchanges between the same 2 peers to be
multiplexed into a single trunk call. This decreases bandwidth
usage, as there are fewer total packets being transmitted. [...]
I'd add which decreases the amount of overhead due to UDP, IP, and
underlying protocols (because otherwise, if you just look at the IAX
layer and above, trunking actually uses more bits).
I think this description of trunking might better go right before the
description of the wire protocol for trunk frames?
Also, it should be clarified that there is _no_ negotiation in the
protocol for whether to use trunking, or the particular trunk mode, and
this must be done out-of-band (although, we could add something for
peers to advertise their trunking support in some kind of
IAX-capabilities IE at the beginning of the call). Presently, some IAX
implementations support trunking (chan_iax2), and some do not (libiax2),
while only CVS-HEAD supports trunk timestamps.
- 6.10 Comfort Noise Frame
The frame carries comfort noise.
The subclass is the level of comfort noise in -dBov.
Hmm, In this case, dead silence should be maxint, right? I've been
sending zero, which is a lot of noise :)
We should also specify that, after sending audio data, implementations
SHOULD (or maybe MUST) send a Comfort Noise Frame to indicate the end of
a sequence of voice frames.. (although, asterisk presently does not
comply with this).
| 0x12 | 18 | VNAK | Video/Voice retransmit
request |
Is this correct? I thought VNAK was sent when a full frame is received
before a preceeding full frame (i.e. when you're expecting sequence
number 2, and you receive something > 2). Yup, that's what seems to
happen as I see it in the code..
- (ACK)
the sequence number counters, and return the same timestamp it
received. This allows the originating peer to determine to which
message the ACK is responding. Receipt of an ACK requires no
action.
Presently, while (both) implementations of IAX put the same timestamp in
ACK packets that they receive, it serves no real purpose to do so; the
sequence numbers in the ACK packet actually do the acknowledgement.
It would require less special-case code if ACK packets actually sent the
acker's timestamp instead of the senders' timestamp. The PONG frames
which return the senders timestamp are useful, and are used to calculate
the round-trip-time of the network.
This relates to the beginning with ACKs and stuff: All full frames must
be acknowleged _either_ by an explicit ACK, _or_ by an implicit ack in
another full frame, and the acknowledgement (implicit or explicit)
should be sent within some timeframe (1RTT or something we should
determine).
8.11 LAGRQ
A LAGRQ is a lag request. It is sent to determine the lag between
2 IAX endpoints, including the amount of time used to process a
frame through a jitterbuffer (if any). It requires a clock-based
timestamp, and must be answered with a LAGRP, which must echo the
LAGRQ's timestamp. The lag between the 2 peers can be computed on
the peer sending the LAGRQ by comparing the timestamp of the LAGRQ
and the time the LAGRP was received.
I'd say we should really just deprecate LAGRQ; The present
implementation (or, last I looked) tried to send the LAGRQ through the
jitterbuffer on one end, and then the LAGRP through the jitterbuffer on
the other end. This often really broke things, because the LAGRP has
the wrong end's timestamp, and therefore, if the clocks between both
sides have skewed, just gave you nonesense results. Using the RR IE's
in PONGs is probably a better way to get the same information.
I think if we marked it as deprecated, and said that compliant
implementations should not send LAGRQ, and should acknowledge and then
ignore them if they are received, nothing would break (because it's just
used for display in iax2 show channels, and even there, that command
would show zero lag if it never received LAGRP.
- Protocol-Defined Information Elements:
It would be super convenient if this table also included the datatype
for each IE (i.e. uint8_t, uint16_t, string, etc..). OTOH, just looking
at iax2.h is easy enough :)
| 0x2f | 47 | RR LOSS | Received loss, as in
rfc1889 |
It's important to mention that this IE actually contains two integers,
the first byte is a short-term loss percentage, and the final low 24
bits loss count (nevermind, I see you get to that later -- cool :)
- RR DELAY
The purpose of the RR DELAY information element is to
indicate the maximum playout delay for a call, per
rfc1889[3]. The data field is 2 octets long and specifies
the number of milliseconds a frame may be delayed before it
must be discarded.
Actually, RR_DELAY indicates the maximum playout delay that a frame
received by the peer is likely to experience before playout. (I'm not
sure if this is in rfc1889, some of the other RRs aren't either; I
added them either because I though they'd be useful, or one of the few
people who commented on this stuff on asterisk-dev did).
RR_DELAY is useful, because when you've received it, you can take
RR_DELAY, add it to RTT, and get a good upper bound is on the delay
between when audio is sent out to the network, to when it's rendered at
the other end.
| 0x00000002 | GSM Full Rate | 33 byte chunks of 160 samples or |
| | | 65 byte chunks of 320 samples |
Is this actually valid (sending MS-GSM 65 byte stuff)?
| 0x00000010 | G.726 | |
+------------+------------------+----------------------------------+
| 0x00000020 | IMA ADPCM | 1 byte per 2 samples |
G.726 is also 1 byte per 2 samples..
| 0x00000100 | G.729 | 20 bytes chunks of 172 samples |
G.729 is 20 bytes per 160 samples, no?
I'll have more comments later, I suspect, and also perhaps some additions.
-SteveK
More information about the asterisk-dev
mailing list