nagios监控heartbeat

zaoqiden 发表于 2015-11-22 16:22:52

　　heartbeat架好后，我们就需要监控起来喽，下面我们就来了解下怎么监控。
　　首先来了解下几个命令，这几个命令在heartbeat安装后会自动加上，我们的监控脚本就用到这几个命令。
　　

# which cl_status
/usr/bin/cl_status
# cl_status listnodes #列出当前heartbeat集群中的节点
192.168.3.1
usvr-211
usvr-210
# cl_status nodestatus usvr-211#列出节点的状态
active
# cl_status nodestatus 192.168.3.1#列出节点的状态
ping

我们的check_heartbeat.sh原理就是列出集群中所有节点，并监测所有节点的状态是否正常，我们实验的节点状态为ping和active。　　
　　当active+ping的个数为0时critical
　　当active+ping的个数小于节点总个数时为warn
　　当active+ping的个数等于节点总个数时为ok
　　

# cat check_heartbeat.sh
#!/bin/bash
# Author: Emmanuel Bretelle
# Date: 12/03/2010
# Description: Retrieve Linux HA cluster status using cl_status
# Based on http://www.randombugs.com/linux/howto-monitor-linux-heartbeat-snmp.html
#
# Autor: Stanila Constantin Adrian
# Date: 20/03/2009
# Description: Check the number of active heartbeats
# http://www.randombugs.com
# Get program path
REVISION=1.3
PROGNAME=`/bin/basename $0`
PROGPATH=`echo $0 | /bin/sed -e 's,[\\/][^\\/][^\\/]*$,,'`
NODE_NAME=`uname -n`
CL_ST='/usr/bin/cl_status'
#nagios error codes
#. $PROGPATH/utils.sh
OK=0
WARNING=1
CRITICAL=2
UNKNOWN=3
usage () {
echo "\
Nagios plugin to heartbeat.
Usage:
$PROGNAME
$PROGNAME [--help | -h]
$PROGNAME [--version | -v]
Options:
--help -lPrint this help information
--version -vPrint version of plugin
"
}
help () {
print_revision $PROGNAME $REVISION
echo; usage; echo; support
}

while test -n "$1"
do
case "$1" in
--help | -h)
help
exit $STATE_OK;;
--version | -v)
print_revision $PROGNAME $REVISION
exit $STATE_OK;;
# -H)
#    shift
#    HOST=$1;;
# -C)
#    shift
#    COMMUNITY=$1;;
*)
echo "Heartbeat UNKNOWN: Wrong command usage"; exit $UNKNOWN;;
esac
shift
done
$CL_ST hbstatus > /dev/null
res=$?
if [ $res -ne 0 ]
then
echo "Heartbeat CRITICAL: Heartbeat is not running on this node"
exit $CRITICAL
fi
declare -i I=0
declare -i A=0
NODES=`$CL_ST listnodes`
for node in $NODES
do
status=`$CL_ST nodestatus $node`
let I=$I+1
#if [ $status == "active" ] 默认情况下检测active状态的个数，但是ping状态也为正常状态，因此改成如下条件。
if [ $status == "active" -o $status == "ping" ]
then
let A=$A+1
fi
done

if [ $A -eq 0 ]
then
echo "Heartbeat CRITICAL: $A/$I"
exit $CRITICAL
elif [ $A -ne $I ]
then
echo "Heartbeat WARNING: $A/$I"
exit $WARNING
else
echo "Heartbeat OK: $A/$I"
exit $OK
fi

我们在nagios客户端，也就是我们的lvs集群usvr-210，usvr-211，我们通过nagios服务器端的check_nrpe来获取监控信息。　　
　　naigos客户端
　　1.先将脚本复制到nagios命令目录下并修改相应权限
　　cp check_heartbeat.sh /usr/local/nagios/libexec/

　　chmod a+x check_heartbeat.sh
　　chown nagios.nagios check_heartbeat.sh
　　2.在naigos客户端的配置文件中加入监控命令。
　　vim /usr/local/nagios/etc/nrpe.cfg

　　command=/usr/local/nagios/libexec/check_heartbeat.sh

　　3.重新载入配置文件。
　　service xinetd reload
　　nagios服务端
　　1.加入相关监控服务
　　

define service {
use                   local-service
service_description heartbeat-lvs-master
check_command       check_nrpe!check_heartbeat
service_groups       heartbeat_services
host_name             usvr-210
check_interval       5
notifications_enabled 1
notification_interval 30
contact_groups       admins
}
define service {
use                   local-service
service_description heartbeat-lvs-slave
check_command       check_nrpe!check_heartbeat
service_groups       heartbeat_services
host_name             usvr-211
check_interval       5
notifications_enabled 1
notification_interval 30
contact_groups       admins
}2.检查并载入配置文件　　
　　nagioscheck
　　service nagios reload
　　监控如下：

　　

　　ok，我们的heartbeat监控完成了。
　　

　　我是参考这个网站http://wiki.debuntu.org/wiki/Linux_HA_Heartbeat/Monitoring_with_Nagios，希望能对大家有所帮助。

页: [1]

运维网's Archiver

nagios监控heartbeat