[asterisk-dev] RFT: Expanded DNS SRV handling in Asterisk 1.4

Fri Oct 26 01:50:04 CDT 2007

>>>>> "JT" == John Todd <jtodd at loligo.com> writes:

JT> I'm not sure I understand how another company having a failed or
JT> sporadically failing infrastructure is something I can control.

Permanently and deliberately broken is different from suffering from
temporary failure. Asterisk should deal with accidental failure as
gracefully as possible, of course with the limitation that Kevin P.
Fleming doesn't have 48 hours in a day and he's the one who is writing
the code.

For deliberate brokenness there is 

JT> Isn't one of the major points of multiple SRV records to allow for
JT> redundancy, which is an extension of "improving perceived
JT> functional behavior"? If that is the case, then I'm not sure how
JT> your response holds up to examination.

There is no redundancy if one of two servers is permanently and
deliberately unreachable.

>> You can just ignore the SRV record and define your own IP-based
>> peer.

JT> Of course. No argument to the contrary there if there is a
JT> pre-existing agreement of endpoints. However, the real strength of
JT> SIP is when there is not a pre-existing agreement of endpoints,
JT> which is one of the major reasons SRV records are useful. This
JT> does not address the point of SRV records in several of the major
JT> areas of utility, so I think we can dispense with the concept of
JT> hardcoded IP address endpoint identification in this discussion as
JT> it is not relevant.

If you don't know the peer, you can't write policy for it anyway. You
don't know which SRV records to ignore. And once you have identified
the broken SRV record, you CAN write policy for it -- by hardcoding
the IP.

JT> MX is not real-time. SRV records are real-time. Failing through a
JT> list of MX hosts does not significantly alter the completion of
JT> the communication, while failing through SRV records does - users
JT> will hang up. And in any case, you are incorrect about MX
JT> overrides: there are absolutely mail clients that "remember"
JT> failed MX hosts and will not try to send to them for some
JT> "cooloff" period.

Of course there are mail clients that remember failed MX hosts. If you
want to say that asterisk should be smart enough to handle that kind
of thing, then absolutely yes. It should just not be something that
the Dial command knows about, and it should not be manually configured
on a per-domain basis.

JT> Dialplans are almost always complicated if an administrator wants
JT> to truly capture error conditions in a meaningful way, or if there
JT> are dollars on the line for failure. "Too complicated" is a local
JT> decision, not one to be forced by the authors of tool components.
JT> Building in hidden methods of easing complexity for less rigorous
JT> developers is fine, but don't sacrifice the flexibility of the
JT> tool for those that truly want to have precise control over the
JT> system.

The source for chan_sip.c is available if you need that kind of
precise control. If your improvements make error recovery better, you
can even submit the changes so that everyone benefits.

/Benny