gaojinguan 发表于 2019-1-12 13:56:19

Nagios之NRPE监控Linux/UNIX主机

1. NRPE简介


      NRPE是Nagios的一个功能扩展,它可在远程Linux/Unix主机上执行插件程序。通过在远程服务器上安装NRPE插件及Nagios插件程序来向Nagios监控平台提供该服务器的本地情况,如CPU负载,内存使用,磁盘使用等。这里将Nagios监控端称为Nagios服务器端,而将远程被监控的主机称为Nagios客户端。
       注意:通过SSH是可以实现在远程的Linux/UNIX主机上执行nagios插件的,比如说check_by_ssh插件就可以实现这项功能。虽然SSH的方式相较于NRPE插件方式更为安全,但是在CPU负载上,无论是监控端还是被监控的远程主机,SSH方式也都更大一些,当面对被监控的主机涉及到成千上百台时,使用这种方式就会是个问题,这也是许多nagios管理员选择使用NRPE方式的主要原因。


1.1 NRPE的设计原理
  NRPE插件包括2部分:

  
https://s4.运维网.com/wyfs02/M02/8D/91/wKioL1iiZ12DTIulAADUau6HKPc459.png-wh_500x0-wm_3-wmp_4-s_2911587488.png
  


[*]http://blog.运维网.com/e/u261/themes/default/images/spacer.gifcheck_nrpe插件,位于本地监控端;
[*]  NRPE进程,运行于远程主机(Linux/UNIX),也就是被监控端。
当nagios需要监控远程主机(Linux/UNIX)的服务时,NRPE具体的工作流程如下:


[*]Nagios会执行check_nrpe插件,并告诉它需要监控的服务项;
[*]check_nrpe插件通过SSL方式与被监控端的nrpe进程连接;
[*]nrpe进程运行对应的nagios插件来执行服务或资源的监测;
[*]NRPE 进程将监测的结果返回给check_nrpe 插件,check_nrpe插件又将结果传递给nagios进程做后续处理。
注意:NRPE进程能够进行服务与资源监控的前提是:远程主机(Linux/UNIX)必须装有nagios插件。
1.2 NRPE使用案例
   1.直接监测
  
https://s5.运维网.com/wyfs02/M02/8D/91/wKioL1iiZ7SgaVqXAABgqWYJLZA098.png-wh_500x0-wm_3-wmp_4-s_1620232478.png
  

  NRPE最直接的使用就是对远程主机的“local”或者“private”资源进行监控,比如CPU负载、内存使用、swap使用、当前的用户数、磁盘的使用情况、进程状态等等。
   2.间接监测

https://s2.运维网.com/wyfs02/M00/8D/93/wKiom1iiZ83xwilmAABKpX8LMVg150.png-wh_500x0-wm_3-wmp_4-s_202942691.png
  当监控端不能够直连远程服务端时,NRPE还可用于间接监控远程主机的“public”服务与资源。比如,已安装nrpe进程和插件的远程主机可以连接远程web服务器(但是监控主机不可以),那么,可以通过配置NRPE进程允许间接监控远程web服务器,在本案例中,NRPE进程相当于监控代理。

  2.NRPE安装与配置

  本文的测试服务器信息:
  监控端IP:172.16.56.131,主机名:monitors
  被监控端IP:192.183.3.145,主机名:kk
  2.1 远程主机端(被监控端)的NRPE安装与配置

  从3.0的版本开始,NRPE在众多的操作系统中的安装都变得更为简单,如有问题可访问https://community.nagios.org/
  1.增加nagios用户
#useradd nagios  2.下载安装nagios plugins
#cd /home/softwares/
#wget http://nagios-plugins.org/download/nagios-plugins-2.1.2.tar.gz
#tar -xzf nagios-plugins-2.1.2.tar.gz   
#cd nagios-plugins-2.1.2
#./configure--with-nagios-user=nagios --with-nagios-group=nagios  注意:要监控MySQL需要添加 --with-mysql。
https://s3.运维网.com/wyfs02/M00/8D/93/wKiom1iiaO7DpBe4AAApBLdEcpc343.png-wh_500x0-wm_3-wmp_4-s_2665155028.png
#make
#make install修改nagios插件安装目录权限:
# chown nagios.nagios /usr/local/nagios
# chown -R nagios.nagios /usr/local/nagios/libexec3.安装NRPE


NRPE下载地址https://sourceforge.net/projects/nagios/files/nrpe-3.x/,本文下载版本是nrpe-3.0.1.tar.gz。
#cd ..
#tar zxf nrpe-3.0.1.tar.gz   
#cd nrpe-3.0.1
#yum -y install openssl openssl-devel
#./configure --with-nagios-user=nagios --with-nagios-group=nagioshttps://s1.运维网.com/wyfs02/M01/8D/93/wKiom1iiaaaxj9yYAAAcn-JY0bk287.png-wh_500x0-wm_3-wmp_4-s_3516183805.png
#make allhttps://s4.运维网.com/wyfs02/M01/8D/93/wKiom1iiadeSI0qRAAA7CRwXw1Y708.png-wh_500x0-wm_3-wmp_4-s_4217530858.png
4.安装NRPE的plugin、deamon等
#make install-pluginhttps://s2.运维网.com/wyfs02/M01/8D/91/wKioL1iiag2yAlGdAAAek7yCi_s391.png-wh_500x0-wm_3-wmp_4-s_1374964280.png
#make install-daemonhttps://s2.运维网.com/wyfs02/M02/8D/91/wKioL1iiairTlMLRAAAZYEu3VjQ156.png-wh_500x0-wm_3-wmp_4-s_308861125.png
#make install-daemon-confighttps://s2.运维网.com/wyfs02/M00/8D/91/wKioL1iiak_wdMZqAAANGH-aJhk080.png-wh_500x0-wm_3-wmp_4-s_640524081.png
这是nrpe该版本的一个bug,详见https://github.com/NagiosEnterprises/nrpe/issues/50。
解决:
https://s2.运维网.com/wyfs02/M02/8D/93/wKiom1iiamSg4g63AAANuwAohwk106.png-wh_500x0-wm_3-wmp_4-s_599087065.png
#make install-config如果需要打开5666端口,则需要下列命令(本案例默认关闭的防火墙):
# iptables -I RH-Firewall-1-INPUT -p tcp -m tcp –dport 5666 -j ACCEPT
# service iptables save4.配置NRPE命令
#vim /usr/local/nagios/etc/nrpe.cfg
  修改allowed_hosts=192.183.3.145,172.16.56.131,允许Nagios服务器端访问;
  
  在命令行测试如下的监测命令,这里根据自己的监测需求对命令进行修改,并写入nrpe.cfg文件:
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_sda1
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_total_procs
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_zombie_procs  查看配置结果:
#grep -v '^#' /usr/local/nagios/etc/nrpe.cfg |sed '/^$/d'
log_facility=daemon
debug=0
pid_file=/usr/local/nagios/var/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=192.183.3.145,172.16.56.131
dont_blame_nrpe=0
allow_bash_command_substitution=0
command_timeout=60
connection_timeout=300
command=/usr/local/nagios/libexec/check_users -w 5 -c 10
command=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
command=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command=/usr/local/nagios/libexec/check_procs -w 150 -c 2005.启动NRPE
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
#netstat -tulpn | grep nrpehttps://s4.运维网.com/wyfs02/M00/8D/91/wKioL1iibvGBQ8e6AAAO3d-QHvU625.png-wh_500x0-wm_3-wmp_4-s_1762644785.png
有两种方式用于管理nrpe服务,nrpe有两种运行模式:
-i      # Run as a service under inetd or xinetd   
-d      # Run as a standalone daemon
可以为nrpe编写启动脚本,使得nrpe以standard alone方式运行:
#vi /etc/init.d/nrped
#!/bin/bash
# chkconfig: 2345 88 12   
# description: NRPE DAEMON   
NRPE=/usr/local/nagios/bin/nrpe   
NRPECONF=/usr/local/nagios/etc/nrpe.cfg   
case "$1" in   
    start)   
      echo -n "Starting NRPE daemon..."   
      $NRPE -c $NRPECONF -d   
      echo " done."   
      ;;   
    stop)   
      echo -n "Stopping NRPE daemon..."   
      pkill -u nagios nrpe   
      echo " done."   
    ;;   
    restart)   
      $0 stop   
      sleep 2   
      $0 start   
      ;;   
    *)   
      echo "Usage: $0 start|stop|restart"   
      ;;   
    esac   
exit 0#chmod +x /etc/init.d/nrped   
#chkconfig --add nrped   
#chkconfig nrped on
#service nrped start   
Starting NRPE daemon... done.2.2 监控端NRPE安装与配置
1.安装依赖包
# yum -y install openssl openssl-devel否则编译nrpe时会出现如下问题:
https://s3.运维网.com/wyfs02/M02/8D/93/wKiom1iia36AxTIpAAAMXLBrl4w012.png-wh_500x0-wm_3-wmp_4-s_3101302464.png
原因是缺少openssl-devel包。
2. NRPE下载与安装
# cd /home/nagios/
# wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-3.0.1.tar.gz--2017-01-17 23:36:36--http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-3.0.1.tar.gz
# tar xzvf nrpe-3.0.1.tar.gz   
# cd nrpe-3.0.1
# ./configure --with-nagios-user=nagios --with-nagios-group=nagioshttps://s2.运维网.com/wyfs02/M01/8D/93/wKiom1iia7CxR1ABAAAcr9fUsis654.png-wh_500x0-wm_3-wmp_4-s_1666700440.png
# make allhttps://s2.运维网.com/wyfs02/M01/8D/91/wKioL1iia8niVBrYAAA6QTi3Duo618.png-wh_500x0-wm_3-wmp_4-s_3014404919.png
# make install-pluginhttps://s2.运维网.com/wyfs02/M02/8D/93/wKiom1iia-CRrv_IAAAdVBa6Y80285.png-wh_500x0-wm_3-wmp_4-s_1494385751.png
安装完成后,会在Nagios安装目录的libexec下生成check_nrpe的插件,如下所示:

# ll /usr/local/nagios/libexec/check_nrpe
-rwxrwxr-x 1 nagios nagios 125293 1月17 23:47 /usr/local/nagios/libexec/check_nrpe
3.NRPE测试
  NRPE命令参数的使用可参详:
# ./check_nrpe -h
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 3.0.1
Last Modified: 09-08-2016
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: OpenSSL 0.9.6 or higher required
Usage: check_nrpe -H[-2] [-4] [-6] [-n] [-u] [-V] [-l] [-d ]
       [-P ] [-S ][-L ] [-C ]
       [-K ] [-A ] [-s ] [-b ]
       [-f ] [-p ] [-t :]
       [-c ] [-a ]
Options:
      = The address of the host running the NRPE daemon
-2         = Only use Version 2 packets, not Version 3
-4         = bind to ipv4 only
-6         = bind to ipv6 only
-n         = Do no use SSL
-u         = (DEPRECATED) Make timeouts return UNKNOWN instead of CRITICAL
-V         = Show version
-l         = Show license
       = Anonymous Diffie Hellman use:
                0 = Don't use Anonymous Diffie Hellman
                  (This will be the default in a future release.)
                1 = Allow Anonymous Diffie Hellman (default)
                2 = Force Anonymous Diffie Hellman
      = Specify non-default payload size for NSClient++
   = The SSL/TLS version to use. Can be any one of: SSLv2 (only),
                SSLv2+ (or above), SSLv3 (only), SSLv3+ (or above),
                TLSv1 (only), TLSv1+ (or above DEFAULT), TLSv1.1 (only),
                TLSv1.1+ (or above), TLSv1.2 (only), TLSv1.2+ (or above)
= The list of SSL ciphers to use (currently defaults
                to "ALL:!MD5:@STRENGTH". WILL change in a future release.)
= The client certificate to use for PKI
         = The private key to use with the client certificate
   = The CA certificate to use for PKI
   = SSL Logging Options
    = bind to local address
    = configuration file to use
       = The port on which the daemon is running (default=5666)
    = The name of the command that the remote daemon should run
    = Optional arguments that should be passed to the command,
                separated by a space.If provided, this must be the last
                option supplied on the command line.
NEW TIMEOUT SYNTAX
-t :
   = Number of seconds before connection times out (default=10)
   = Check state to exit with in the event of a timeout (default=CRITICAL)
    Timeout state must be a valid state name (case-insensitive) or integer:
    (OK, WARNING, CRITICAL, UNKNOWN) or integer (0-3)
Note:
This plugin requires that you have the NRPE daemon running on the remote host.
You must also have configured the daemon to associate a specific plugin command
with the option you are specifying here.Upon receipt of the
argument, the NRPE daemon will run the appropriate plugin command and
send the plugin output and return code back to *this* plugin.This allows you
to execute plugins on remote hosts and 'fake' the results to make Nagios think
the plugin is being run locally.通过NRPE监控远程Linux主机要使用chech_nrpe插件进行,其语法格式如下:
check_nrpe -H[-n] [-u] [-p ] [-t ] [-c ] [-a ]
# ./check_nrpe -H 192.183.3.145 -p 5666
NRPE v3.0.11.创建命令定义
# cd /usr/local/nagios/etc/objects/
# vim commands.cfg
define command{
      command_name    check_nrpe
      command_line    $USER1$/check_nrpe -H "$HOSTADDRESS$"-c "$ARG1$"
}2.创建host与service定义
# vim linuxserver.cfg   
#############################################################
#create a new template for linux boxes
#############################################
define host{
name linux-box ; Name of this template
use generic-host ; Inherit default values
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
register 0 ; DONT REGISTER THIS - ITS A TEMPLATE
}
########################################################
#defie a new host for the remote Linux/Unix box   
#that references the newly created linux-box host template
########################################################
define host{
use linux-box ; Inherit default values from a template
host_name remotehost ; The name we're giving to this
server
alias centos6_kk ; A longer name for the server
address 192.183.3.145 ; IP address of the server
}
######################################################################################
#The following service will monitor the CPU load on the remote host.
# The "check_load" argument thatis passed to the check_nrpe command
# defiition tells the NRPE daemon to run the "check_load" comman#d as defied in the nrpe.cfg fie
######################################################################################
define service{
use generic-service
host_name remotehost
service_description CPU Load
check_command check_nrpe!check_load
}
##############################################################################################
#The following service will monitor the number of currently logged in users on the remote host
############################################################################################
define service{
use generic-service
host_name remotehost
service_description Current Users
check_command check_nrpe!check_users
}
#############################################################################################
#The following service will monitor the free drive space on /dev/sda1 on the remote host.
view plain copy print?
#
#注意:这里的/dev/sda1是通过被检测主机df命令获得,切勿根据官方文档盲目填写/dev/hda1
############################################################################################
define service{
use generic-service
host_name remotehost
service_description /dev/sda1 Free Space
check_command check_nrpe!check_sda1
}
##############################################################################################
#The following service will monitor the total number of processes on the remote host.
##############################################################################################
define service{
use generic-service
host_name remotehost
service_description Total Processes
check_command check_nrpe!check_total_procs
}
###########################################################################################
#The following service will monitor the number of zombie processes on the remote host.
###########################################################################################
define service{
use generic-service
host_name remotehost
service_description Zombie Processes
check_command check_nrpe!check_zombie_procs
}注意:监控端(Nagios服务端)定义的service命令与被监控端NRPE中内置的监控命令一致。
3.启动所定义的命令和服务
# vim /usr/local/nagios/etc/nagios.cfg
添加一行:
cfg_file=/usr/local/nagios/etc/objects/linuxserver.cfg配置语法检查:
# service nagios configtest
或者
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL
Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
    Checked 15 services.
Warning: Host 'kk' has no default contacts or contactgroups defined!
    Checked 2 hosts.
    Checked 1 host groups.
    Checked 1 service groups.
    Checked 1 contacts.
    Checked 1 contact groups.
    Checked 26 commands.
    Checked 5 time periods.
    Checked 0 host escalations.
    Checked 0 service escalations.
Checking for circular paths...
    Checked 2 hosts
    Checked 0 service dependencies
    Checked 0 host dependencies
    Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 1
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check重启nagios:
#service nagios restart
Running configuration check...
Stopping nagios: done.
Starting nagios: done.登录Nagios web监控页面查看配置的监控是否生效:
https://s1.运维网.com/wyfs02/M01/8D/91/wKioL1iibTWDUYxCAACEqzJiWYk254.png-wh_500x0-wm_3-wmp_4-s_2225605171.png
至此,NRPE的简单安装与配置结束!
4. NRPE自定义配置


  如果需要监控远程主机(Linux/UNIX)更多的服务,需要:

[*]  在远程主机端的nrpe.cfg文件中增加新的命令定义;
[*]  在监控端的nagios配置文件中增加新的服务监控定义;
  比如说增加swap空间的使用率监控。
  1.被监控远程主机端配置
  在本例中假定想要的结果是当swap空闲率低于10%将会有“critical”警告,低于20%将有“warning”警告;
#/usr/local/nagios/libexec/check_swap -w 20% -c 10%
SWAP OK - 59% free (2251 MB out of 3823 MB) |swap=2251MB;764;382;0;3823
  将该命令添加至nrpe.cfg文件中:
#vi /usr/local/nagios/etc/nrpe.cfg
command=/usr/local/nagios/libexec/check_swap -w 20% -c 10%  
  重启nrpe进程:
#service nrpedrestart
Stopping NRPE daemon... done.
Starting NRPE daemon... done.2.监控端的配置
# vim /usr/local/nagios/etc/objects/linuxserver.cfg
define service{
use generic-service
host_name remotehost
service_description Swap Usage
check_command check_nrpe!check_swap
}验证配置:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL
Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 16 services.
Warning: Host 'kk' has no default contacts or contactgroups defined!
Checked 2 hosts.
Checked 1 host groups.
Checked 1 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 26 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 1
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check

  重启nagios:
# service nagios restart
Running configuration check...
Stopping nagios: done.
Starting nagios: done.
  刷新nagios监控页面:
  https://s5.运维网.com/wyfs02/M00/8D/96/wKiom1iisFnxHj4PAACfVZDgLxU614.png-wh_500x0-wm_3-wmp_4-s_3067167958.png
  成功!
附注:本文理论部分参阅NRPE 3.0官方文档,实践部分有参阅http://467754239.blog.运维网.com/4878013/1558897/,欢迎批评指正!



页: [1]
查看完整版本: Nagios之NRPE监控Linux/UNIX主机