[Asterisk-Dev] SIP failover using Asterisk and openais

Wed Sep 28 16:08:50 MST 2005

At 2:44 PM -0700 on 9/28/05, Steven Dake wrote:
>Fellow developers,
>
>I maintain the open source project openais
>http://developer.osdl.org/dev/openais which is an open source version of
>the Service Availability Forum's AIS specification.
>
>This implementation provides checkpointing and application failover.
>I'd like to create an integration between the SIP channel module and
>openais AMF/checkpointing and perhaps have it integrated into the
>asterisk source base as a proof of concept of AIS.
>
>The integration would allow multiple servers to maintain an
>active/standby the state of all SIP sessions.  Then the active server
>for which the IP phone is communicating with would continue to operate
>and maintain its session in the event the active server failed.
>
>Would someone be kind enough to point me to the data structures or
>functions where the state of a SIP session is recorded.  Is it possible
>just to record SIPs state, or does the rest of the asterisk server that
>loads the sip module contain state about the SIP session?
>
>In SIP, is an IP phone configured to talk to one specific IP address, or
>is there a discovery process to determine the SIP server's ip address?

While I can't answer the parts about the structures within Asterisk, 
I can answer this portion.

There are two methods that SIP devices typically use for proxy 
discovery: SRV records, or crude "primary/secondary" methods.

I'll discuss the crude method first: if a SIP request fails going to 
the specified primary proxy (or the specified "outbound" proxy in 
some cases) then there will be a hard-coded configuration item which 
specifies a "secondary" proxy or outbound proxy.  The primary and 
secondary are coded into the configuration file, meaning that 
typically a reload of the config (reboot, refresh, whatever) is 
required to change the settings.  The timers and methods for how and 
when to use this secondary proxy are typically vendor-dependant, and 
vary greatly from vendor to vendor.  A much more interesting question 
is "What happens when the primary comes back on-line?" - most device 
vendors haven't really given this much thought, especially in an 
environment with six digit (or more) user community size.

A better solution is SRV record use.  Some SIP devices are SRV record 
compliant, which allows them (in a well-implemented version of code) 
to cascade through several different SIP proxies or endpoints, with 
varying levels of preference specified in the DNS.   I'll let the 
RFC's speak more clearly on the topic at length, but the summary is 
that during a REGISTER, INVITE, etc. the device will do a DNS query 
for a record like _sip._udp.domain.com and try to get back a list of 
SRV records of possible candidate hosts which can receive the action. 
Then, an A record search is done on the "best" candidate for the 
action, and the list is iterated until the action is possible (i.e.: 
no network error, application error, etc.).

Asterisk sort-of supports SRV records, but not in a particularly 
robust way.  There is no cascading through multiple records; it just 
tries the first one.  That of course does not mean that Asterisk 
servers cannot be in lists of SRV replies for SIP client devices...

SRV records and use with SIP:
   http://www.zvon.org/tmRFC/RFC2782/Output/index.html
   http://www.zvon.org/tmRFC/RFC3263/Output/index.html

JT

>Finally in do_monitor, I notice there is a ast_sched_wait, followed by
>an ast_io_wait.  ast_io_wait appears to dispatch any pending i/o events
>as derived from poll.  I need to plug in here with an ast_io_add to add
>my "healthchecking" for the SIP server.  My question is with
>ast_sched_wait..  Will it timeout immediately if there is I/O waiting?
>In other event systems, timers and events are usually integrated..  I'm
>not sure how these two work in asterisk by looking at the code.
>
>Thanks for the help
>
>regards
>-steve

JT