[asterisk-dev] RFT: Expanded DNS SRV handling in Asterisk 1.4
John Todd
jtodd at loligo.com
Thu Oct 25 14:54:01 CDT 2007
> >>>>> "JT" == John Todd <jtodd at loligo.com> writes:
>
>JT> Here's why I ask: I've had first-hand experience with
>JT> prioritized/weighted SRV records that cause serious problems.
>JT> Someone puts "10 10 _sip._udp.inside-proxy.foo.com" as their first
>JT> SRV record for foo.com, and "20 20
>JT> _sip._udp.outside-proxy.foo.com" as their second preference SRV
>JT> record for foo.com. The host "inside-proxy" isn't reachable from
>JT> the Internet. Therefore, every call attempt that goes to their
>JT> domain goes first to a proxy that times out (wait... wait...
>JT> wait...) and then goes to the second one that completes. This
>JT> leads to unacceptable timeouts, and eventually leads to hard-coded
>JT> SRV record data put into a local resolving nameserver (can you say
>JT> "domain hijacking for operational purposes?") to avoid the delay.
>JT> This is Very Bad, and leads to User Anger.
>
>Isn't that a case of "Doctor, it hurts when I..."?
I'm not sure I understand how another company having a failed or
sporadically failing infrastructure is something I can control.
Isn't one of the major points of multiple SRV records to allow for
redundancy, which is an extension of "improving perceived functional
behavior"? If that is the case, then I'm not sure how your response
holds up to examination.
>I bet most SIP calls are between cooperating companies, so it should
>be possible to fix the problem correctly instead of doing workarounds.
That's a rather short-sighted bet. And while most SIP calls are
probably between cooperating companies, the point (again) of SRV
records is to allow communications between endpoints that have no
prior hardcoded relationship. I'm not sure what your argument is
here, but if it is that the function of SRV records aren't really
that important, I suppose we could all go back to IP addresses. If
your system is just using SRV records for redundancy on pre-defined
peers, then perhaps having all of your calls take an additional 15
seconds to complete would be OK while you try to hammer out the
problems with the other endpoint. Myself, I prefer to have the
system automatically route around problems if my system is smart
enough to detect them without exposing the user base to the bad
behaviors. Lastly, SRV records are not just for redundancy between
peers that know each other; they are primarily for resource discovery
as their first goal, and their second goal is redundancy/load
sharing. From how I interpret your arguments, you seem to be
entirely ignoring the first goal of automated resource discovery,
which would make creation of a manual load-spreading routine
impossible or highly impractical.
>JT> I guess I'm saying that SRV record lookups should be able to be
>JT> turned off within Dial (which does exist today, despite my
>JT> approval of SRV lookups being "on" by default) and a function that
>JT> performs SRV lookups should be created so that the local
>JT> administrator can start to create a good/bad list of possible SRV
>JT> response entries for future use. This would not change the way *
>JT> behaves today; it would simply provide an alternative for the more
>JT> sophisticated administrator to control their own fate.
>
>I believe that is complication for no good reason.
I think you disagree with me in the above paragraph, but agree with
me below that a function for SRV lookups would be a good idea.
>JT> I'm all for SRV automation behind-the-scenes as the default
>JT> behavior. However, I am less happy when there are no alternatives
>JT> to letting an administrator do things in a better way.
>
>You can just ignore the SRV record and define your own IP-based peer.
Of course. No argument to the contrary there if there is a
pre-existing agreement of endpoints. However, the real strength of
SIP is when there is not a pre-existing agreement of endpoints, which
is one of the major reasons SRV records are useful. This does not
address the point of SRV records in several of the major areas of
utility, so I think we can dispense with the concept of hardcoded IP
address endpoint identification in this discussion as it is not
relevant.
>JT> I'm sure I'm not alone here when I say that I dislike programs
>JT> that think they're smarter than me and won't let me change the
>JT> settings.
>
>I have never heard of a mail server which allows you to ignore certain
>MX records but obey the others. If you want to override MX for a
>certain domain, you create a policy for sending mail to that domain,
>and that ignores all the MX records.
MX is not real-time. SRV records are real-time. Failing through a
list of MX hosts does not significantly alter the completion of the
communication, while failing through SRV records does - users will
hang up. And in any case, you are incorrect about MX overrides:
there are absolutely mail clients that "remember" failed MX hosts and
will not try to send to them for some "cooloff" period. I have a
vague recollection of that method being used at AOL when I was
chatting with one of their mail admins a few years back, and I
certainly know that method is used by spammers (the content is
irrelevant to the technology) since I've watched them crawl through
my bogus MX slow-down traps on one try, and then the next try
(automated) they jump right to the functional MX without trying the
dead ends first. After a few hours of this, they try transmitting to
the dead servers again for another "test" pass.
>JT> Summary: I'd love to see a function that resolves and returns an
>JT> array-like set of SRV lookup results for a domain. Let the
>JT> administrator write a routine that then runs through the various
>JT> possible destinations.
>
>That on the other hand makes sense. If you keep all SRV priority
>handling out of asterisk itself, you can probably keep the asterisk
>code simple. There is a risk that the dial plan code gets too
>complicated though.
Dialplans are almost always complicated if an administrator wants to
truly capture error conditions in a meaningful way, or if there are
dollars on the line for failure. "Too complicated" is a local
decision, not one to be forced by the authors of tool components.
Building in hidden methods of easing complexity for less rigorous
developers is fine, but don't sacrifice the flexibility of the tool
for those that truly want to have precise control over the system.
JT
More information about the asterisk-dev
mailing list