zabbix企业应用之解决大量的nodata报警通知

a13698822086 发表于 2019-1-18 13:23:30

　　研究与使用zabbix快1年了，其他功能都很多，最令我头痛的是如果机房的网络出现波动或者代理服务器出现问题，那么就会出现大量的服务器nodata报警，由于我采用邮件发送报警，邮件开通短信接收功能，基本出现大量nodata的报警就会造成手机死机（米3手机），为了解决这个问题测试过各自办法、
　　1、设置trigger的依赖，如果使用多个zabbix对应多个proxy的话，配置很麻烦，不容易修改，所以放弃；
　　2、使用自定义脚本报警，然后脚本里进行分析与处理，目前采用此方法。
　　下面是使用第2种方法后，出现nodata问题的报警截图：
http://s3.运维网.com/wyfs02/M02/24/E7/wKiom1NWF5SjBdSvAAChMSleljE714.jpg
　　下面是使用第2种方法后，nodata问题恢复后报警截图：
http://s3.运维网.com/wyfs02/M00/24/E7/wKiom1NWF9-hAD2fAACn4iZPLhk098.jpg
　　如何实现：

　　一、服务端（zabbix web地址）
　　1、是自定义脚本发送报警
http://s3.运维网.com/wyfs02/M02/24/E6/wKioL1NWGOrQJp4FAAGexGf_JCs842.jpg
　　选择“管理”==》“示警媒体类型”==》“创建示警媒体类型”
http://s3.运维网.com/wyfs02/M02/24/E7/wKiom1NWGa3hW2klAADLjeq75-U616.jpg
　　其中“脚本名称”里是使用的发送脚本名称，这个脚本的路径可以在客户端的zabbix_server.conf里定义，具体如何定义参考下面客户端设置。
　　2、在动作（action）里做设置
http://s3.运维网.com/wyfs02/M00/24/E7/wKiom1NWGjuBDPY5AAGKTXJf328668.jpg
　　选择“配置”==》“动作”==》“创建动作”，”事件源“选择”触发器“。
http://s3.运维网.com/wyfs02/M02/24/E6/wKioL1NWGmqjn0KdAALWkqgFYXM927.jpg
　　然后再选择”操作“==》“仅送到”==“E-mail”，这个E-mail是刚才“示警媒体”里定义的名字。
　　二、客户端操作
　　1、修改zabbix_server.conf
AlertScriptsPath=/usr/local/zabbix/bin　　修改脚本的路径
　　2、把脚本放到/usr/local/zabbix/bin目录，并起名为zabbix_send_mail.sh，给与755权限，授予zabbix组与用户权限。
# cat /usr/local/zabbix/bin/zabbix_send_mail.sh
#!/bin/bash
. /etc/profile
problem_cmd="^PROBLEM.*system time out"
recovery_cmd="^RECOVERY.*system time out"
#echo "$3"|/bin/mail -s "$2" $1
if [ `echo "$2"|egrep -E "$problem_cmd"|wc -l` -gt 0 ];then
echo "echo \"$3\"|/bin/mail -s \"$2\" $1" >>/tmp/zabbix_problem_mail.sh
elif [ `echo "$2"|egrep -E "$recovery_cmd"|wc -l` -gt 0 ];then
echo "echo \"$3\"|/bin/mail -s \"$2\" $1" >>/tmp/zabbix_recovery_mail.sh
else
echo "$3"|/bin/mail -s "$2" $1
fi　　目前我这里设置如果发送的信息里有包含system time out内容的就重新定向给tmp目录的一个文件里（我这里的time out其实就是nodata，我这里规定nodate信息为system time out）
　　3、设置nodata报警发送
# cat /usr/local/zabbix/bin/cront_send_mail.sh
#!/bin/bash
. /etc/profile
problem_file='/tmp/zabbix_problem_mail.sh'
recovery_file='/tmp/zabbix_recovery_mail.sh'
if [ ! -e $problem_file ];then
touch $problem_file
chown zabbix:zabbix $problem_file
fi
if [ ! -e $recovery_file ];then
touch $recovery_file
chown zabbix:zabbix $recovery_file
fi
alert_value=15
problem_value=`grep -c echo $problem_file`
recovery_value=`grep -c echo $recovery_file`
time=`date +%Y-%m-%d_%T`
contact='244979152@qq.com'
if [ $problem_value -lt $alert_value ];then
/bin/bash $problem_file
rm -rf $problem_file
rm -rf $problem_file-$alert_value
elif [ $problem_value -gt $alert_value ] && [ ! -e $problem_file-$alert_value ];then
echo "时间:$time 超时次数:$problem_value!"|/bin/mail -s "问题:灾难报警!机房出现大量超时报警!!!" $contact
rm -rf $problem_file
touch $problem_file-$alert_value
elif [ $problem_value -gt $alert_value ] && [ -e $problem_file-$alert_value ];then
rm -rf $problem_file
rm -rf $problem_file-$alert_value
fi
if [ `grep -c echo $recovery_file` -lt $alert_value ];then
/bin/bash $recovery_file
rm -rf $recovery_file
rm -rf $recovery_file-$alert_value
rm -rf $problem_file-$alert_value
elif [ `grep -c echo $recovery_file` -gt $alert_value ] && [ ! -e $recovery_file-$alert_value ];then
echo "时间:$time 超时次数:$recovery_value!"|/bin/mail -s "恢复:灾难报警!机房出现大量超时报警!!!" $contact
rm -rf $recovery_file
rm -rf $problem_file-$alert_value
touch $recovery_file-$alert_value
elif [ `grep -c echo $recovery_file` -gt $alert_value ] && [ -e $recovery_file-$alert_value ];then
rm -rf $recovery_file
rm -rf $recovery_file-$alert_value
rm -rf $problem_file-$alert_value
fi　　我这里定义如果超过15次的system time out邮件，就只发送给我设置的244979152@qq.com，而且仅发送一封。
　　4、crontab设置
*/2 * * * * /bin/bash /usr/local/zabbix/bin/cront_send_mail.sh　　这样就实现了以下需求：
　　1、如果有大量的nodata报警，仅发送一封邮件；
　　2、如果nodata报警回复，则也只发送一封邮件；
　　3、设置简单，不需要修改trigger与action、模板。
　　目前我这里测试的如果一个proxy挂了，出现大量proxy的主机nodata报警，也仅发送一封给我设置的报警邮箱，其他action里设置报警联系人不会受到。

页: [1]

运维网's Archiver

zabbix企业应用之解决大量的nodata报警通知