[Asterisk-Users] Asterisk Redundency

Benjamin Lawetz blawetz at teliphone.ca
Tue Oct 25 10:56:39 MST 2005


Ok, I tried something slightly different.

I modified the existing the udp.monitor (or was it the tcp.monitor) of mon
and basically sending a "sniffed" SIP Registration packet which I send to
the asterisk server. If I don't receive an answer within a set time. The
monitor sends an error.

It tells you if the server is at least answering SIP. Mind you I once had a
server freeze, but the monitoring kept getting an answer. So not 100%
fool-proof, but save my *** in the past :-)

-----Original Message-----
From: asterisk-users-bounces at lists.digium.com
[mailto:asterisk-users-bounces at lists.digium.com] On Behalf Of Adam Moffett
Sent: October 25, 2005 9:51 AM
To: Asterisk Users Mailing List - Non-Commercial Discussion
Subject: Re: [Asterisk-Users] Asterisk Redundency



Benjamin Lawetz wrote:

> 
>  
>
>>Since I can't do that, what I've settled on is heartbeat + mon.  
>>Heartbeat will monitor for a system level failure and switch to the 
>>backup
>>    
>>
>machine if neccesary; and mon will watch the asterisk (or any
>  
>
>>other) service and restart it and/or alert me if it fails.
>>    
>>
>
>What kind of monitor are you using to monitor asterisk?
>
>
>  
>
Sorry for my slow response.  My asterisk monitor right now is embarrassingly
simple.  All it does is execute show uptime and look for output starting
with "System", see below.  Obviously the method has limitations.  1) It will
only really only tell me that the daemon is running, not that it's able to
carry any calls.  2) It only works on localhost.

Input on how to test a remote instance of asterisk would be welcome, as well
as a method of making a test call or reliably testing for the ability to
make calls.  My impression is that this would require asterisk to have a
"Dial" command in the CLI, or a linux SIP client that I could execute from
the shell.  I'm not aware of the existence of either.

Any other simple and reliable methods of testing asterisk's condition would
be welcome.

The alerts, by the way are pretty simple as well.  See the excerpt from
mon.cf below.  restartasterisk.alert does exactly what it says.  
stopeverything.alert shuts down heartbeat, which will cause another node in
the cluster to take over...in fact that node will start mon, which will then
use the restartasterisk.alert to start up asterisk.  Asterisk only starts on
the backup machine when the primary fails so that config changes replicated
from the primary will take effect.  Total downtime should be < 3min.  Which
will let me hit 5-nine if it only happens once a year ;)

Config changes are replicated via rsync and ssh every few minutes.  
Voicemails are also copied from primary to backup by rsync.  One thing I
still need to do is make rsync stop attempting to replicate files when the
failover occurrs.  That will probably just require another alert below the
"stopeverything.alert".

The replication of couse means that this setup will not protect me from a
bad config change that breaks asterisk, as that change will be replicated
throughout the cluster.  So all significant config changes should be tested
on a standalone box.


[root at phones2 mon]# cat /usr/lib/mon/mon.d/asterisk.monitor
#!/bin/sh
##can only check localhost.  Always checks localhost regardless of input

        SHOW_UPTIME=`/usr/sbin/asterisk -rx "show uptime" | /bin/cut -b 1-6`
        if [ $SHOW_UPTIME == "System" ]; then
                exit 0
        else
                echo "localhost"
                exit 1
        fi


 From mon.cf:

watch asterisk
        service asterisk
                description asterisk pbx on localhost
                interval 10s
                monitor asterisk.monitor
                period wd {Sun-Sat}
                        alert mail.alert adam at plexicomm.net
                        alert restartasterisk.alert adam at plexicomm.net
                        alertevery 30s
        service asterisk-failover
                description checking if we need to stop heartbeat
                interval 10s
                monitor asterisk.monitor
                period wd {Sun-Sat}
                        alert stopeverything.alert adam at plexicomm.net
                        alertafter 5 3m

_______________________________________________
--Bandwidth and Colocation sponsored by Easynews.com --

Asterisk-Users mailing list
Asterisk-Users at lists.digium.com
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users






More information about the asterisk-users mailing list