[Asterisk-Users] Asterisk Redundency
Adam Moffett
adam at plexicomm.net
Tue Oct 25 06:51:25 MST 2005
Benjamin Lawetz wrote:
>
>
>
>>Since I can't do that, what I've settled on is heartbeat + mon.
>>Heartbeat will monitor for a system level failure and switch to the backup
>>
>>
>machine if neccesary; and mon will watch the asterisk (or any
>
>
>>other) service and restart it and/or alert me if it fails.
>>
>>
>
>What kind of monitor are you using to monitor asterisk?
>
>
>
>
Sorry for my slow response. My asterisk monitor right now is
embarrassingly simple. All it does is execute show uptime and look for
output starting with "System", see below. Obviously the method has
limitations. 1) It will only really only tell me that the daemon is
running, not that it's able to carry any calls. 2) It only works on
localhost.
Input on how to test a remote instance of asterisk would be welcome, as
well as a method of making a test call or reliably testing for the
ability to make calls. My impression is that this would require
asterisk to have a "Dial" command in the CLI, or a linux SIP client that
I could execute from the shell. I'm not aware of the existence of either.
Any other simple and reliable methods of testing asterisk's condition
would be welcome.
The alerts, by the way are pretty simple as well. See the excerpt from
mon.cf below. restartasterisk.alert does exactly what it says.
stopeverything.alert shuts down heartbeat, which will cause another node
in the cluster to take over...in fact that node will start mon, which
will then use the restartasterisk.alert to start up asterisk. Asterisk
only starts on the backup machine when the primary fails so that config
changes replicated from the primary will take effect. Total downtime
should be < 3min. Which will let me hit 5-nine if it only happens once
a year ;)
Config changes are replicated via rsync and ssh every few minutes.
Voicemails are also copied from primary to backup by rsync. One thing I
still need to do is make rsync stop attempting to replicate files when
the failover occurrs. That will probably just require another alert
below the "stopeverything.alert".
The replication of couse means that this setup will not protect me from
a bad config change that breaks asterisk, as that change will be
replicated throughout the cluster. So all significant config changes
should be tested on a standalone box.
[root at phones2 mon]# cat /usr/lib/mon/mon.d/asterisk.monitor
#!/bin/sh
##can only check localhost. Always checks localhost regardless of input
SHOW_UPTIME=`/usr/sbin/asterisk -rx "show uptime" | /bin/cut -b 1-6`
if [ $SHOW_UPTIME == "System" ]; then
exit 0
else
echo "localhost"
exit 1
fi
From mon.cf:
watch asterisk
service asterisk
description asterisk pbx on localhost
interval 10s
monitor asterisk.monitor
period wd {Sun-Sat}
alert mail.alert adam at plexicomm.net
alert restartasterisk.alert adam at plexicomm.net
alertevery 30s
service asterisk-failover
description checking if we need to stop heartbeat
interval 10s
monitor asterisk.monitor
period wd {Sun-Sat}
alert stopeverything.alert adam at plexicomm.net
alertafter 5 3m
More information about the asterisk-users
mailing list