[Asterisk-Users] Asterisk Redundency

Tue Oct 25 06:51:25 MST 2005

Benjamin Lawetz wrote:

> 
>  
>
>>Since I can't do that, what I've settled on is heartbeat + mon.  
>>Heartbeat will monitor for a system level failure and switch to the backup
>>    
>>
>machine if neccesary; and mon will watch the asterisk (or any
>  
>
>>other) service and restart it and/or alert me if it fails.
>>    
>>
>
>What kind of monitor are you using to monitor asterisk?
>
>
>  
>
Sorry for my slow response.  My asterisk monitor right now is 
embarrassingly simple.  All it does is execute show uptime and look for 
output starting with "System", see below.  Obviously the method has 
limitations.  1) It will only really only tell me that the daemon is 
running, not that it's able to carry any calls.  2) It only works on 
localhost.

Input on how to test a remote instance of asterisk would be welcome, as 
well as a method of making a test call or reliably testing for the 
ability to make calls.  My impression is that this would require 
asterisk to have a "Dial" command in the CLI, or a linux SIP client that 
I could execute from the shell.  I'm not aware of the existence of either.

Any other simple and reliable methods of testing asterisk's condition 
would be welcome.

The alerts, by the way are pretty simple as well.  See the excerpt from 
mon.cf below.  restartasterisk.alert does exactly what it says.  
stopeverything.alert shuts down heartbeat, which will cause another node 
in the cluster to take over...in fact that node will start mon, which 
will then use the restartasterisk.alert to start up asterisk.  Asterisk 
only starts on the backup machine when the primary fails so that config 
changes replicated from the primary will take effect.  Total downtime 
should be < 3min.  Which will let me hit 5-nine if it only happens once 
a year ;)

Config changes are replicated via rsync and ssh every few minutes.  
Voicemails are also copied from primary to backup by rsync.  One thing I 
still need to do is make rsync stop attempting to replicate files when 
the failover occurrs.  That will probably just require another alert 
below the "stopeverything.alert".

The replication of couse means that this setup will not protect me from 
a bad config change that breaks asterisk, as that change will be 
replicated throughout the cluster.  So all significant config changes 
should be tested on a standalone box.

[root at phones2 mon]# cat /usr/lib/mon/mon.d/asterisk.monitor
#!/bin/sh
##can only check localhost.  Always checks localhost regardless of input

        SHOW_UPTIME=`/usr/sbin/asterisk -rx "show uptime" | /bin/cut -b 1-6`
        if [ $SHOW_UPTIME == "System" ]; then
                exit 0
        else
                echo "localhost"
                exit 1
        fi

 From mon.cf:

watch asterisk
        service asterisk
                description asterisk pbx on localhost
                interval 10s
                monitor asterisk.monitor
                period wd {Sun-Sat}
                        alert mail.alert adam at plexicomm.net
                        alert restartasterisk.alert adam at plexicomm.net
                        alertevery 30s
        service asterisk-failover
                description checking if we need to stop heartbeat
                interval 10s
                monitor asterisk.monitor
                period wd {Sun-Sat}
                        alert stopeverything.alert adam at plexicomm.net
                        alertafter 5 3m