[Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

Sun Jan 4 15:59:47 MST 2004

> 1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is
> mostly trevial, however what "signal" is needed to detect a system failure
> and move the physical connection to a second machine/interface? (If there
> are three systems in a cluster, what signal is needed? If a three-way
> switch is reqquired, does someone want to design, build, and sell it to
> users? Any need to discuss a four-way switch? Should there be a single
> switch that flip-flops all three at the same time (T1, Ethernet, pstn)?)

Simple idea:  Have a process on each machine pulse a lead-state (something
a s simple as DTR out a serial port or a single data line on a parallel
port) out to an external box.  This box is strictly discrete hardware and
built with timeout that is retriggered by the pulse.  When the pulse fails
to arrive, the box switches the T1 over to the backup system.

>
> Since protecting calls in progress (under all circumstances and
> configurations) is likely the most expensive and most difficult to achive,
> we can probably all agree that handling this should be left to some
> future long-range plan. Is that acceptable to everyone?

Its going to be almost impossible to preserve calls in progress.  If you
switch a T1 from one machine to the other, there's going to either going
to be a lack of sync (ISDN D-channels need to come up, RBS channels need
to wink) that's going to result in the loss of the call.

> 2. In a hot-spare arrangement (single primary, single running secondary),
> what static and/or dynamic information needs to be shared across the
> two systems to maintain the best chance of switching to the secondary
> system in the shortest period of time, and while minimizing the loss of
> business data? (Should this same data be shared across all systems in
> a cluster if the cluster consists of two or more machines?)
>
> 3. If a clustered environment, is clustering based on IP address or MAC
> address?
>    a. If based on an IP address, is a layer-3 box required between * and
>       sip phones? (If so, how many?)

Yes.  You'll need something like Linux Virtual Server or an F5 load
balancing box to make this happen.  You can play silly games with round
robin DNS, but it doesn't handle failure well.

>    b. If based on MAC address, what process moves an active * MAC address
>       to a another * machine (to maintain connectivity to sip phones)?

Something like Ultra Monkey (http://www.ultramonkey.org)

>    c. Should sessions that rely on a failed machine in a cluster simply
>       be dropped?
>    d. Are there any realistic ways to recover RTP sessions in a clustered
>       environment when a single machine within the cluster fails, and RTP
>       sessions were flowing through it (canreinvite=no)?
>    e. Should a sip phone's arp cache timeout be configurable?

Shouldn't need to worry about that unless the phone is on the same
physical network segment.

>    f. Which system(s) control the physical switch in #1 above?

A voting system...all systems control it.  It is up to the switch to
decide who isn't working right.

>    g. Is sharing static/dynamic operational data across some sort of
>       high-availability hsrp channel acceptable, or, should two or more
>       database servers be deployed?

DB Server clustering is a fairly solid technology these days.  Deploy a DB
cluster if you want.

> 4. If a firewall/nat box is involved, what are the requirements to detect
>    and handle a failed * machine?
>    a. Are the requirements different for hot-spare vs clustering?
>    b. What if the firewall is an inexpensive device (eg, Linksys) with
>       minimal configuration options?
>    c. Are the nat requirements within * different for clustering?
>
> 5. Should sip phones be configurable with a primary and secondary proxy?
>    a. If the primary proxy fails, what determines when a sip phone fails
>       over to the secondary proxy?

Usually a simple timeout works for this..but if your clustering/hot-spare
switch works right...the client should never need to change.

>    b. After fail over to the secondary, what determines when the sip phone
>       should switch back to the primary proxy? (Is the primary ready to
>       handle production calls, or is it back ready for a system admin to
>       diagnose the original problem in a non-production manner?)

Auto switch-back is never a good thing.  Once a system is taken out of
service by an automated monitoring system, it should be up to human
intervention to say that it is ready to go back into service.