[asterisk-dev] Proposal to seperate qualify & keep alive
John Todd
jtodd at loligo.com
Wed Jun 28 11:56:22 MST 2006
[top-posting madness continued]
Despite several client-side devices supporting keep-alives
(Linksys/Sipura) this is not typical from my experiences, and is
probably not desirable. While I understand the desire to have
client-side issues handled by the client, it seems to me to be a poor
idea to allow end-users to decide how many DoS-look-alike packets
they want to send to the SIP proxy network. It would seem (from the
perspective of the service provider) that this would be a function
best handled by the server. I don't quite buy the argument "Well, we
lock the devices so the clients never have the option to change
anything." since this is broken on another level that I won't go into
here.
In any case, I don't know how the keep-alives would work other than
OPTIONS request or something that generated a "reply", since the
outside NAT port number is unknown and therefore can only be known by
the server. A non-reply message seems to me to only work from
server->NAT->client and not the other way around, and OPTIONS seems
to be fairly "heavy" to just keep NAT translations alive, so this
still makes me prefer a non-SIP message to keep mappings open.
Olle has helpfully put some comments in the code which signify a
place where someone could contribute some code that would smooth the
OPTIONS requests to SIP entities instead of bursting, but there's
still a gap I think in the methodology used currently and what an
ideal/preferred configuration choice set might be (see my message
below.)
Lastly, if you can reproduce the bug you describe below in the
SVN-TRUNK tree, then you should by all means open a ticket in the
bugtracker as this is a serious problem (peers becoming unreachable
would be 'serious' in my book. ;-)
PS: Sorry if some of these points have already been brought up; this
mail is being written off-line and will be delivered in a day or so
after final authoring.
JT
At 4:45 PM -0500 6/27/06, John Lange wrote:
>
>On a side note; There seems to be something wrong with qualify=5000. We
>have about 50 clients on a machine most of which have qualify=yes, the
>rest are qualify=5000. The ones set to 5000 simply "disappear" with no
>message logged and nothing mentioned in the console. "sip show peers"
>shows "Unspecified" for the Host and "Unknown" for the status (as if the
>device had never registered) instead of the normal "Unreachable" or
>"lagged".
>
>I'm starting to suspect a bug in qualify. Anyone else have issues like
>this?
>
>Back on topic, just to play devils advocate with my own suggestion; It
>should be noted that at least some devices have a keep-alive option of
>their own.
>
>So, in our case we seem to be making progress on this problem of
>satellite latency by setting qualify=no, nat=yes and a 5 second
>keep-alive on the client device.
>
>The question then becomes, if most client devices support keep-alive is
>there still a purpose to having it on the server side as well? How many
>client devices support keep-alive? I know Linksys products do but I
>haven't looked into others yet.
>
>I would advocate yes to server-side keep-alive simply because it offers
>the most flexibility. You never know when it might make more sense to do
>it server-side instead of client-side.
>
>John
>
>On Tue, 2006-06-27 at 07:30 +0200, Loic DIDELOT wrote:
>> Hi,
>> I actually like the idea of separating keep-alives from the qualify.
>> Defining the frequency of the packages is very important to adapt the
>> asterisk behaviour according to the customers one has. This would solve
>> many of our problems. Are there more people who need this? Is there a
>> way to get this developed and include it in asterisk 1.4? voipGATE would
>> be interested in cosponsoring this feature but only if it will be
>> included in the 1.4 stable release and if available for IAX as well.
>>
>> Best regards,
>> Loic Didelot.
>>
>>
>> On Mon, 2006-06-26 at 13:06 -0700, John Todd wrote:
>> > At 8:32 PM +0200 6/26/06, Johansson Olle E wrote:
> > > >26 jun 2006 kl. 19.23 skrev John Lange:
>> > >
>> > >>In the current implementation, qualify sends out a SIP request at the
>> > >>specified interval and if it doesn't receive a reply within that same
>> > >>interval asterisk flags the peer as unreachable.
>> > >>
>> > >>This also acts as a sort of keep-alive for devices behind NAT when
>> > >>combined with the nat=yes parameter. The regular flow of SIP packets
>> > >>keeps the NAT connective alive for the device behind the firewall.
>> > >>
>> > >>The problem is, these are two very different concepts and at times it
>> > >>would be nice if we could separate the two.
>> > >>
>> > >>Specifically; we have some clients with devices behind nat and
>> > >>satellite. Their nat and satellite requires a more-or-less constant flow
> > > >>of packets to keep the connection alive. However due to the quirky
> > > >>nature of satellite combined with long round-trip times the qualify
> > > >>option needs to be set high (5000ms) or Asterisk won't send
>calls to the
>> > >>client.
>> > >>
>> > >>In fact we would like to set qualify=no because often the client appears
>> > >>to be very lagged when the satellite perceives the connection to be idle
>> > >>(apparently it queues packets until it has a bunch and sends them in
>> > >>groups) but if you initiate a call the lag drops immediately to an
>> > >>acceptable level (800ms).
>> > >>
>> > >>But if we set qualify=no then the firewall closes the connection and
>> > >>they can't receive any calls.
>> > >>
>> > >>So, the question is; is it reasonable to undertake the implementation of
>> > >>a keep alive for sip clients?
>> > >>
>> > >>Any thoughts on how this should be done? SIP NOTIFY or would something
>> > >>else make more sense?
>> > >
>> > >I don't see a reason for changing method. We should propably find a way
>> > >to override and be able to dial out regardless of the monitoring status.
> > > >That seems like a simple fix.
>> > >
>> > >/O
>> >
>> > I would actually agree that the two functions should be separated. I
>> > find myself often in the same position, where the use of "qualify="
>> > is used as a NAT mapping tool only, and I don't particularly care
>> > about the actual milliseconds of response time to the request. I
>> > also think we would be well-served to make these timers a bit more
>> > flexible, since right now everyone is in the "same bucket" as far as
>> > timing goes for how frequently OPTIONS requests are sent. I'd like
>> > to be more aggressive for foolish people who have poorly-configured
>> > firewalls that close NAT UDP sessions after 30 (or fewer) seconds,
>> > and currently the only way to do this is to change the code to send
>> > ALL of my OPTIONS requests much more frequently, which eventually
>> > leads to a huge amount of nonsense noise on my network to solve for a
>> > few poorly behaved clients.
>> >
>> > SER sends "bogus" packets fairly frequently as part of it's NAT
>> > module, and this seems to work well.
>> >
>> > The current method in Asterisk has a few downsides:
>> >
>> > 1) OPTIONS packets are larger than just simple UDP keepalives (but
>> > not by much)
>> >
>> > 2) OPTIONS requests require stateful storage of status, so if I
>> > have 6000 SIP "peers" each using "qualify=", then Asterisk needs to
>> > store a fairly large amount of memory aside to track each one of
>> > those transmitted OPTIONS statements, and if at any time there are
>> > 10% of those peers which are slow to respond (say, two cycles) then I
>> > have a huge backlog of stateful requests in queue. If a UDP packet
>> > that did not require return receipt was sent just for NAT keepalives,
>> > this would be much lighter weight, and we could move the "heavier"
>> > OPTIONS request interval to a larger time value.
>> >
>> > 3) The current OPTIONS request is bursty, and all of the OPTIONS
>> > are sent in 60 second intervals using the same interval timer. This
>> > is really ugly, with big spikes of data every 60 seconds. This
>> > should be probably distributed so that each entry has it's own timer.
>> >
>> >
>> > I propose a different way to do this, with an example out of sip.conf
>> > listed below. I know that this will require the creation of memory
> > > space for each of these timers (and a whole slew of timer-related
>> > issues internally to Asterisk) but it does seem like it would be more
>> > flexible to do it this way and may reduce the amount of processing
>> > for the OPTIONS requests if just lightweight UDP can be sent for NAT
>> > translations. With this method, I could possibly crank up the
>> > OPTIONS qualifiers to something like 5 minutes, but leave the NAT
>> > translation keepalives down at 20 seconds and hopefully see less load
>> > on my Asterisk servers and network with large numbers of REGISTER'ed
>> > hosts. This is all kind of pointless for 20 users, but Asterisk is
>> > no longer being used only for sites with double or triple-digit
>> > numbers of users, and it makes a difference at scale.
>> >
>> >
>> > ; Hypothetical sip.conf settings for "new" qualify/NAT timers
>> > ;
>> > ; Send OPTIONS requests to measure latency (450ms in this ex.)
>> > ; every 120 seconds. The qualifytime timer starts based
>> > ; on the time the last REGISTER was successfully parsed, or
>> > ; if a static IP host, then based on the time the entry was
>> > ; parsed in this file plus a random number of seconds not
>> > ; greater than the value in "qualify=". If "qualify="
>> > ; is non-zero but there is no "qualifytime=", then default
>> > ; of qualifytime is 60 seconds. If "qualifytime=" is
>> > ; non-zero but there is no "qualify=", then qualifytime is
>> > ; 500 milliseconds.
>> > qualify=450
>> > qualifytime=120
>> > ;
>> > ; Send very minimal, one-way packets to hosts in order
>> > ; to keep NAT translations open. Send once every 20 seconds.
>> > ; No default value.
>> > nat-keepalive=20
>> > ;
>> >
>> >
> > > JT
More information about the asterisk-dev
mailing list