Nagios利用NRPE监控Linux主机
一、简介1、NRPE介绍
NRPE是Nagios的一个功能扩展,它可在远程Linux/Unix主机上执行插件程序。通过在远程服务器上安装NRPE插件及Nagios插件程序来向Nagios监控平台提供该服务器的本地情况,如CPU负载,内存使用,磁盘使用等。这里将Nagios监控端称为Nagios服务器端,而将远程被监控的主机称为Nagios客户端。
Nagios监控远程主机的方法有多种,其方式包括SNMP,NRPE,SSH,NCSA等。这里介绍其通过NRPE监控远程Linux主机的方式。
NRPE(Nagios Remote Plugin Executor)是用于在远端服务器上运行监测命令的守护进程,它用于让Nagios监控端基于安装的方式触发远端主机上的检测命令,并将检测结果返回给监控端。而其执行的开销远低于基于SSH的检测方式,而且检测过程不需要远程主机上的系统账号信息,其安全性也高于SSH的检测方式。
http://s3.运维网.com/wyfs02/M01/4A/C3/wKioL1Qm3NLBZZSnAAHCItyFxXI201.jpg
2、NRPE的工作原理
NRPE有两部分组成
check_nrpe插件:位于监控主机上
nrpe daemon:运行在远程主机上,通常是被监控端agent
注意:nrpe daemon需要Nagios-plugins插件的支持,否则daemon不能做任何监控
http://s3.运维网.com/wyfs02/M00/4A/C3/wKioL1Qm2krywnxLAACf2VJUtQI655.jpg
详细的介绍NRPE的工作原理
当Nagios需要监控某个远程Linux主机的服务或者资源情况时:
首先:Nagios会运行check_nrpe这个插件,告诉它要检查什么;
其次:check_nrpe插件会连接到远程的NRPE daemon,所用的方式是SSL;
然后:NRPE daemon 会运行相应的Nagios插件来执行检查;
最后:NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。
二、被监控端安装Nagios-plugins插件和NRPE
1、添加nagios用户
# useradd -s /sbin/nologin nagios 2、安装nagios-plugins,因为NRPE依赖此插件
# yum -y install gcc gcc-c++ make openssl openssl-devel
# tar xf nagios-plugins-2.0.3.tar.gz
# cd nagios-plugins-2.0.3
# ./configure--with-nagios-user=nagios --with-nagios-group=nagios
# make && make install
#注意:如何要监控mysql 需要添加 --with-mysql 3、安装NRPE
# tar xf nrpe-2.15.tar.gz
# cd nrpe-2.15
# ./configure --with-nrpe-user=nagios \
> --with-nrpe-group=nagios \
> --with-nagios-user=nagios \
> --with-nagios-group=nagios \
> --enable-command-args \
> --enable-ssl
# make all
# make install-plugin
# make install-daemon
# make install-daemon-config 4、配置NRPE
# grep -v '^#' /usr/local/nagios/etc/nrpe.cfg |sed '/^$/d'
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666 #监听的端口
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=192.168.0.105 #允许的地址通常是Nagios服务器端
dont_blame_nrpe=0
allow_bash_command_substitution=0
debug=0
command_timeout=60
connection_timeout=300
command=/usr/local/nagios/libexec/check_users -w 5 -c 10
command=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command=/usr/local/nagios/libexec/check_procs -w 150 -c 200
5、启动NRPE
#以守护进程的方式启动
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
# netstat -tulpn | grep nrpe
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 22597/nrpe
tcp 0 0 :::5666 :::* LISTEN 22597/nrpe
有两种方式用于管理nrpe服务,nrpe有两种运行模式:
-i # Run as a service under inetd or xinetd
-d # Run as a standalone daemon
可以为nrpe编写启动脚本,使得nrpe以standard alone方式运行:
# cat /etc/init.d/nrped
#!/bin/bash
# chkconfig: 2345 88 12
# description: NRPE DAEMON
NRPE=/usr/local/nagios/bin/nrpe
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
case "$1" in
start)
echo -n "Starting NRPE daemon..."
$NRPE -c $NRPECONF -d
echo " done."
;;
stop)
echo -n "Stopping NRPE daemon..."
pkill -u nagios nrpe
echo " done."
;;
restart)
$0 stop
sleep 2
$0 start
;;
*)
echo "Usage: $0 start|stop|restart"
;;
esac
exit 0
# chmod +x /etc/init.d/nrped
# chkconfig --add nrped
# chkconfig nrped on
# service nrped start
Starting NRPE daemon... done.
# netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1031/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1108/master
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 22597/nrpe
tcp 0 0 :::22 :::* LISTEN 1031/sshd
tcp 0 0 ::1:25 :::* LISTEN 1108/master
tcp 0 0 :::5666 :::* LISTEN 22597/nrpe
三、监控端安装NRPE
1、安装NRPE
# tar xf nrpe-2.15.tar.gz
# cd nrpe-2.15
# ./configure
> --with-nrpe-user=nagios \
> --with-nrpe-group=nagios \
> --with-nagios-user=nagios \
> --with-nagios-group=nagios \
> --enable-command-args \
> --enable-ssl
# make all
# make install-plugin
#安装完成后,会在Nagios安装目录的libexec下生成check_nrpe的插件
# cd /usr/local/nagios/libexec/
# ll -d check_nrpe
-rwxrwxr-x. 1 nagios nagios 76769 9月28 08:07 check_nrpe
2、check_nrpe的用法
# ./check_nrpe -h
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.15
Last Modified: 09-06-2013
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
Usage: check_nrpe -H[ -b] [-4] [-6] [-n] [-u] [-p ] [-t ] [-c ] [-a ]
Options:
-n = Do no use SSL
-u = Make socket timeouts return an UNKNOWN state instead of CRITICAL
= The address of the host running the NRPE daemon
= bind to local address
-4 = user ipv4 only
-6 = user ipv6 only
= The port on which the daemon is running (default=5666)
= Number of seconds before connection times out (default=10)
= The name of the command that the remote daemon should run
= Optional arguments that should be passed to the command.Multiple
arguments should be separated by a space.If provided, this must be
the last option supplied on the command line.
Note:
This plugin requires that you have the NRPE daemon running on the remote host.
You must also have configured the daemon to associate a specific plugin command
with the option you are specifying here.Upon receipt of the
argument, the NRPE daemon will run the appropriate plugin command and
send the plugin output and return code back to *this* plugin.This allows you
to execute plugins on remote hosts and 'fake' the results to make Nagios think
the plugin is being run locally.通过NRPE监控远程Linux主机要使用chech_nrpe插件进行,其语法格式如下:
check_nrpe -H[-n] [-u] [-p ] [-t ] [-c ] [-a ]
# ./check_nrpe -H 192.168.0.81
NRPE v2.15
3、定义命令
# cd /usr/local/nagios/etc/objects/
# vim commands.cfg
#增加到末尾行
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H "$HOSTADDRESS$"-c "$ARG1$"
}
4、定义服务
# cp windows.cfg linhost.cfg
# grep -v '^#' linhost.cfg |sed '/^$/d'
define host{
uselinux-server
host_namelinhost
aliasMy Linux Server
address192.168.0.81
}
define service{
usegeneric-service
host_namelinhost
service_descriptionCHECK USER
check_commandcheck_nrpe!check_users
}
define service{
usegeneric-service
host_namelinhost
service_descriptionLoad
check_commandcheck_nrpe!check_load
}
define service{
usegeneric-service
host_namelinhost
service_descriptionSDA1
check_commandcheck_nrpe!check_hda1
}
define service{
usegeneric-service
host_namelinhost
service_descriptionZombie
check_commandcheck_nrpe!check_zombie_procs
}
define service{
usegeneric-service
host_namelinhost
service_descriptionTotal procs
check_commandcheck_nrpe!check_total_procs
}
这里重点说下,Nagios服务端定义服务的命令完全是根据被监控端NRPE中内置的监控命令,如下图所示
http://s3.运维网.com/wyfs02/M02/4A/EC/wKioL1QnZhGQMN-ZAAM_Km8AoXQ787.jpg
5、启动所定义的命令和服务
# vim /usr/local/nagios/etc/nagios.cfg
#增加一行
cfg_file=/usr/local/nagios/etc/objects/linhost.cfg
6、配置文件语法检查
# service nagios configtest
Nagios Core 4.0.7
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 06-03-2014
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 20 services.
Checked 3 hosts.
Checked 2 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 26 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 3 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
Object precache file created:
/usr/local/nagios/var/objects.precache
7、重新启动nagios服务
# service nagios restart
Running configuration check...
Stopping nagios: done.
Starting nagios: done.
8、打开Nagios web监控页面
1)首先点击【Hosts】查看监控主机状态是否为UP
http://s3.运维网.com/wyfs02/M01/4A/C8/wKiom1Qm8WfyoR-sAAF5rctJNZ8410.jpg
2)其次点击【Services】查看各监控服务的状态是否为OK
注意:在监控新添加的主机linhost;出现状态为CRITICAL,提示没有那个文件或目录。下面是解决办法
http://s3.运维网.com/wyfs02/M02/4A/C8/wKiom1Qm8WfCZrd-AAb-FyGH-V4273.jpg
在监控Linhost主机时出现一个CRITICAL的警告,查找解决办法
http://s3.运维网.com/wyfs02/M01/4A/CA/wKioL1Qm8jyzbPBcAAEWP2KVKxc476.jpg
###被监控端修改NRPE配置文件并重启NRPE服务
# vim nrpe.cfg
command=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
# service nrped restart
###监控端修改linhost.cfg配置文件并重启nagios和httpd服务
# vim linhost.cfg
#注释:原来这里是hda1,现在修改成sda1
define service{
use generic-service
host_name linhost
service_description SDA1
check_command check_nrpe!check_sda1
}
# service nagios restart
Running configuration check...
Stopping nagios: done.
Starting nagios: done.
# service httpd restart
停止 httpd: [确定]
正在启动 httpd: [确定] 再次点击【services】即为刷新页面,查看如下图所示:
http://s3.运维网.com/wyfs02/M01/4A/CB/wKioL1Qm9JKADd8SAAa9TVtAuVc417.jpg
时间:2014-12-26
更新一个监控httpd服务的错误
今天在看日志的时候,在nginx的错误日志中发现很多一样的错误日志,起初是因为其它php程序的bug呢,后来跟开发人员讨论,排除了这个问题,于是就到Google上搜索,才知道原来是监控上配置文件的问题?
错误日志截图:
http://s3.运维网.com/wyfs02/M00/57/83/wKioL1ScxubQpdanAAWm48I5xOM330.jpg解决办法参考这篇文章:
http://forum.joomla.org/viewtopic.php?t=666220
http://s3.运维网.com/wyfs02/M01/57/83/wKioL1ScxyvizCvmAAH2KmpFN28562.jpg
页:
[1]