nagios监控安装配置

sdoghds88888 发表于 2019-1-15 06:32:45

　　

　　#在隋老师文档基础上添加一些说明，更友好
　　系统版本：centos5.5_x64（由于是源码安装32位一样，readhat也相同）
　　nagios版本：
　　nagios-3.2.0
　　nagios-plugins-1.4.14
　　nrpe-2.12
　　# yum install gd-devel openssl-devel gcc -y
　　# yum install httpd mysql mysql-server php php-mysql -y
　　# 编译安装主程序nagios
　　[root@server1 nagios#
　　# tar fvxz nagios.tar.gz
　　# ./configure --prefix=/usr/local/nagios
　　# useradd nagios          添加nagios用户
　　# make all
　　# make install
　　make install
　　- This installs the main program, CGIs, and HTML files
　　make install-init                安装进程脚本
　　- This installs the init script in /etc/rc.d/init.d
　　make install-commandmode       安装命令模板文件
　　- This installs and configures permissions on the
　　directory for holding the external command file
　　

　　make install-config             安装配置文件
　　- This installs *SAMPLE* config files in /usr/local/nagios/etc
　　You'll have to modify these sample files before you can
　　use Nagios.Read the HTML documentation for more info
　　on doing this.Pay particular attention to the docs on
　　object configuration files, as they determine what/how
　　things get monitored!
　　make install-webconf             安装web配置文件
　　- This installs the Apache config file for the Nagios
　　web interface
　　

　　生成用户
　　# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagios          生成web访问用户（-c没有这个文件创建，有的话会覆盖原有的文件）
　　New password:
　　Re-type new password:
　　Adding password for user nagios
　　

　　给nagios用户开权限，让他能够查看信息！
　　# vim /usr/local/nagios/etc/cgi.cfg
　　在所有的nagiosadmin后面添加       ,nagios
　　

　　必须关闭selinux和iptables或者配置允许，否则不能web访问，nagios不能调用一些资源
　　getenforce                   查看selinux状态
　　setenforce 0                关闭selinux状态
　　iptables -L
　　iptables -F
　　

　　重新启动apache和nagios,现在就可以访问nagios了，只是没有添加收集信息的脚本和通讯用的nrpe罢了
　　service httpd restart
　　chkconfig --add nagios                将nagios添加到系统服务
　　service nagios restart
　　

　　本机为什么是down的状态？？？
　　

　　监控分析控制台－－－－－－－－－－－－－－－主程序
　　插件
　　－－－－－－－－－－－－－－被监控主机
　　

　　# pwd
　　/usr/local/nagios/libexec（里面是可执行的插件nagios-plugins；nrpe编译后的解压包）
　　# ls
　　#
　　插件目录下什么没有有阿！
　　

　　安装插件
　　tar zxf nagios-plugins-1.4.14.tar.gz
　　cd nagios-plugins-1.4.14
　　# ./configure --prefix=/usr/local/nagios/
　　可选的选项--with-gnutls --with-openssl --enable-extra-opts --enable-perl-modules
　　make && make install
　　

　　安装nrpe，在监控端和被监控端都需要nrpe来通讯的，所以都需要安装
　　tar fvxz nrpe*.tar.gz
　　./configure --prefix=/usr/local/nagios
　　useradd nagios
　　make
　　make install-daemon
　　make install-daemon-config
　　make install-xinetd
　　make install-plugin    //注意，这条一定不要忘记了，生成check_nrpe插件
　　###
　　###到此naigos服务器端安装完成，lamp通过yum安装就可以，mysql非必须的，php-mysql需要否则需要手动配置php和apache的结合
　　###然后安装nagios核心
　　###安装nrpe（监控端和被监控端通讯通过nrpe走的是ssl加密通道）
　　###安装nagios-plugins（生成各种脚本，安装完默认是监控本机的）
　　###现在就可以在web页面里面看到监控本机了
　　

　　怎样监控的更多！
　　

　　# pwd
　　/usr/local/nagios/etc
　　# vim nagios.cfg
　　编辑主配置文件
　　cfg_file=/usr/local/nagios/etc/objects/commands.cfg
　　cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
　　cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
　　cfg_file=/usr/local/nagios/etc/objects/templates.cfg
　　(cfg_file=/usr/local/nagios/etc/objects/100.cfg注意：新加的主机要监控请在这里添加一行，同时创建这个文件然后定义监控主机和服务，配置可以复制localhost.cfg文件成100.cfg)
　　

　　通过上面的语句来调用那些配置文件
　　# pwd
　　/usr/local/nagios/etc/objects
　　

　　时间timeperiods.cfg
　　define timeperiod{
　　timeperiod_name 24x7
　　alias       24 Hours A Day, 7 Days A Week
　　sunday       00:00-24:00
　　monday       00:00-24:00
　　tuesday       00:00-24:00
　　wednesday    00:00-24:00
　　thursday    00:00-24:00
　　friday       00:00-24:00
　　saturday    00:00-24:00
　　}
　　定义了一个监控时间段,它的名称是24x7,监控的时间是每天全天24小时
　　

　　插件commands.cfg
　　define command{
　　command_name check-host-alive
　　command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
　　}
　　

　　(监控谁，监控我们刚刚添加的主机100.cfg )
　　监控谁localhost.cfg——在nagios监控页面，点hosts显示定义的主机
　　define host {
　　host_name    wqk-centos //被监控主机的名称,最好别带空格
　　alias       test
　　address       192.168.18.100//被监控主机的IP地址
　　check_command check-host-alive//监控的命令check-host-alive,这个命令来自commands.cfg,用来监控主机是否存活
　　notification_options d,u,r //指定什么情况下提醒
　　check_interval1
　　max_check_attempts    2 //检查失败后重试的次数
　　contact_groupsadmins//联系人组,上面在contact.cfg中定义的admins
　　notification_interval 10 //提醒的间隔,每隔10秒提醒一次
　　notification_period 24x7 //提醒的周期, 24x7,同样来自于我们之前在timeperiods.cfg中定义的
　　check_period 24x7 //检查的时间段24x7,同样来自于我们之前在timeperiods.cfg中定义的
　　contact_groups admins//联系人组,上面在contactgroups.cfg中定义的admins
　　}
　　

　　通过简单的复制修改就可以定义多个主机了.我们在这加上另外一台机器
　　主机名为：test-201 ip：192.168.0.201
　　define host {
　　host_name    test-201
　　alias       ubuntu-201
　　address       192.168.0.201
　　check_command check-host-alive
　　notification_options d,u,r
　　check_interval1
　　max_check_attempts    2
　　contact_groupsadmins
　　notification_interval 10
　　notification_period 24x7
　　}
　　与联系人可以组成联系人组一样,多个主机也可以组成主机组:
　　define hostgroup{
　　hostgroup_namelinux-servers
　　alias       Linux Servers
　　members       nagios,apache,test-201,wqk-centos//组的成员主机,多个主机以逗号相隔,必须是上面hosts.cfg中定义的
　　}
　　

　　联系人contacts.cfg
　　

　　define contact {
　　contact_namekyo    //联系人的名称,这个地方不要有空格
　　alias       sys admin
　　host_notification_period    24x7
　　host_notification_options    d,u,r
　　service_notification_period 24x7
　　service_notification_options w,u,c,r
　　service_notification_commands notify-service-by-email
　　host_notification_commands    notify-host-by-email
　　email 手机号@139.com //centos默认安装了sendmail，会自动调用来发邮件
　　#通过139手机邮箱发信报警！
　　pager 1391119xxxx
　　}
　　

　　下面就可以将多个联系人组成一个联系人组
　　

　　define contactgroup{
　　contactgroup_name    admins //联系人组的名称,同样不能空格
　　alias                Nagios Administrators
　　members             nagiosadmin,kyo //组的成员,来自于上面定义的contacts.cfg,如果有多个联系人则以逗号相隔
　　}
　　

　　

　　检查错误
　　/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
　　chkconfig --add nagios将naigos服务添加到系统服务，然后可以用service命令启动
　　chkconfig nagios on
　　service nagios restart
　　

　　下面是最关键的了,用nagios主要是监控一台主机的各种信息,包括本机资源,对外的服务等等.这些在nagios里面都是被定义为一个个的项目(nagios称之为服务,为了与主机提供的服务相区别,我这里用项
　　

　　目这个词),而实现每个监控项目,则需要通过commands.cfg文件中定义的命令.
　　

　　例如我们现在有一个监控项目是监控一台机器的web服务是否正常, 我们需要哪些元素呢?最重要的有下面三点:首先是监控哪台机,然后是这个监控要用什么命令实现,最后就是出了问题的时候要通知
　　

　　哪个联系人?
　　

　　定义服务——在localhost.cfg中定义；在nagios监控页面，点services显示所定义各个主机的服务
　　define service {
　　host_name    fudong //被监控的主机,hosts.cfg中定义的
　　service_description apache //这个监控项目的描述,这个会在web页面中出现
　　check_period 24x7
　　normal_check_interval 2 //循环检查的间隔时间
　　retry_check_interval 1
　　max_check_attempts    5 //重试的次数
　　notification_period 24x7 //通知的时间段
　　check_period24x7//监控的时间段,是timeperiods.cfg中定义的
　　notification_options w,u,c,r//在监控的结果是wucr时通知联系人
　　 contact_groups    sagroup //联系人组,是contacts.cfg中定义的
　　check_command check_http //所用的命令,是commands.cfg中定义的
　　

　　}
　　

　　

　　

　　

　　关于插件的返回状态
　　# echo $?
　　2
　　# /usr/local/nagios/libexec/check_http -I 192.168.18.50
　　HTTP OK HTTP/1.1 200 OK - 43306 bytes in 0.026 seconds |time=0.026288s;;;0.000000 size=43306B;;;0
　　# echo $?
　　0
　　# /usr/local/nagios/libexec/check_http -I 192.168.18.50 -u /a.html -s hello
　　HTTP WARNING: HTTP/1.1 404 Not Found
　　# echo $?
　　1
　　

　　

　　0 成功1 警告2 严重错误3 未知
　　自己编写插件！！！！！！！！！！！！！！！！
　　

　　

　　自定义命令
　　define command {
　　command_name check_url
　　command_line $USER1$/check_http -I $HOSTADDRESS$ -u $ARG1$ -s $ARG2$
　　}
　　

　　

　　使用新定义的命令
　　

　　define service {
　　host_name    fudong
　　service_description apache
　　check_period 24x7
　　normal_check_interval 2
　　retry_check_interval 1
　　max_check_attempts    5
　　notification_period 24x7
　　notification_options w,u,c,r
　　#    check_command check_http
　　check_command check_url!/index.html!hello
　　

　　}
　　########################################################################
　　check_mysql
　　vim /usr/local/nagios/libexec/check_mysql
　　#!/bin/bash
　　#check_mysql status
　　IP=$1
　　

　　mysql -u test -h $IP -p123 -e 'show databases;' &> /dev/null
　　

　　if [ $? -eq 0 ]; then
　　echo "mysql OK!"
　　exit 0;
　　else
　　echo "mysql err!"
　　exit 2;
　　fi
　　

　　vim /usr/local/nagios/etc/objects/commands.cfg
　　define command{
　　command_name check_mysql
　　command_line $USER1$/check_mysql $ARG1$
　　}
　　

　　vim /usr/local/nagios/etc/objects/localhost.cfg
　　define service {
　　host_name    mail.vfast.com
　　service_description mysql
　　check_period 24x7
　　normal_check_interval 2
　　retry_check_interval 1
　　max_check_attempts    2
　　notification_period 24x7
　　notification_options w,u,c,r
　　check_command check_mysql!192.168.18.69
　　

　　}
　　

　　

　　service nagios restart
　　

　　

　　###################################################################################
　　

　　

　　yum install expect -y
　　

　　

　　define command{
　　

　　command_name notify-host-by-sms
　　

　　command_line /usr/local/nagios/libexec/nagios-mail "$(/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n")"smtp.163.comY29vbHdhbmdjaG9uZ0AxNjMuY29tUVE4MTBXQU5HODIwMCFAcoolwangchong@163.com$CONTACTEMAIL$"** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **"
　　}
　　如果遇到host条目一会有，一会消失的问题，可以killall nagios 再重新启动nagios！
　　

　　******************************************************************************
　　监控远程主机的系统信息
　　

　　被监控主机
　　安装nrpe，在监控端和被监控端都需要nrpe来通讯的，所以都需要安装
　　yum -y install xinetd
　　tar fvxz nrpe*.tar.gz
　　./configure --prefix=/usr/local/nagios
　　useradd nagios
　　

　　make
　　

　　make install-daemon
　　

　　make install-daemon-config
　　

　　make install-xinetd
　　

　　make install-plugin    注意，这条一定不要忘记了，生成check_nrpe插件
　　

　　在被监控机上安装nagios插件
　　

　　tar zxvf nagios-plugins-1.4.15.tar.gz
　　cd nagios-plugins-1.4.15
　　./configure --with-nagios-user=nagios --with-nagios-group=nagios
　　make
　　make install
　　

　　检查目录及文件：
　　

　　ll /usr/local/nagios/libexec
　　

　　看看里面是不是有一大堆check的什么东西，如果有就对了
　　

　　在被监控主机开启nrpe服务
　　vim /etc/xinetd.d/nrpe
　　# default: on
　　# description: NRPE (Nagios Remote Plugin Executor)
　　service nrpe
　　{
　　flags       = REUSE
　　socket_type = stream
　　port          = 5666
　　wait          = no
　　user          = nagios
　　group       = nagios
　　server       = /usr/local/nagios/bin/nrpe
　　server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
　　log_on_failure+= USERID
　　disable       = no
　　only_from    = 192.168.18.254＃监控主机的ip，保证他可以连接进来！
　　}
　　

　　vim /etc/services
　　nrpe 5666/tcp
　　

　　vim /usr/local/nagios/etc/nrpe.cfg
　　allowed_host=127.0.0.1,192.168.0.1(本机ip),192.168.0.100(监控机ip)
　　

　　service xinetd restart
　　启动nrpe进程
　　/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
　　# /usr/local/nagios/libexec/check_nrpe -H 192.168.18.188
　　NRPE v2.12
　　

　　#注意关闭防火墙！
　　注意1：
　　nrpe 进程无法启动
　　部分服务器有时执行nrpe启动命令,执行没问题.可进程起不来.

1
ps-ef|grep nrpe

　　######看不到nrpe进程.
1.原因很大是因为服务器上开启了xinetd. 关闭xinetd进程即可
2.因为一般都使用的是-d模式启动的nrpe进程,那么它是独立起的demon进程,所以如果xinetd进程也在的话,nrpe就起不来.
3.如果使用-i模式启动,那么就必须要启动xinetd守护进程.

1
2
/etc/init.d/xinetd stop
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

　　现在再看, 进程已经启动.
　　注意2：
　　报错：CHECK_NRPE: Error - Could not complete SSL handshake
　　如果nagios服务器在内网，服务器是在公网，则需要先知道nagios地址转换后的公网ip，然后将此ip写入被监控端的配置文件中。
　　例：在被监控端：
　　tail -f /var/log/messages
　　Aug 19 16:59:25 test nrpe: Host X:X:X:X is not allowed to talk to us!
　　X:X:X:X 即为内网转换后的公网地址

　　在被监控端添加此ip
　　vim /etc/xinetd.d/nrpe
　　only_from    = X:X:X:X
　　vim /usr/local/nagios/etc/nrpe.cfg
　　allowed_hosts=127.0.0.1,X:X:X:X
　　

　　在被监控主机安装插件
　　

　　vim nrpe.cfg
　　command=/usr/local/nagios/libexec/check_users -w 5 -c 10
　　command=/usr/local/nagios/libexec/check_users -w 5 -c 10
　　command=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
　　command=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
　　command=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
　　command=/usr/local/nagios/libexec/check_procs -w 150 -c 200
　　

　　通过以上字段来定义命令，以及接收命令后执行的插件
　　

　　如果想不明白
　　nagios -----libexec/check_nrpe -c 命令发送给被监控主机
　　被监控主机接收到命令以后去查找nrpe.cfg中command字段，再去执行对应的本地插
　　

　　件，返回结果给监控主机的nagios
　　

　　定义服务，来检测一下
　　define host {
　　host_name    zcg
　　alias       nrpe-server
　　address       192.168.18.188
　　check_command check-host-alive
　　notification_options d,u,r
　　check_interval1
　　max_check_attempts    2
　　contact_groupsadmins
　　notification_interval 10
　　notification_period 24x7
　　}
　　

　　define service {
　　host_name    zcg
　　service_description nrpe
　　check_period 24x7
　　normal_check_interval 2
　　retry_check_interval 1
　　max_check_attempts    5
　　notification_period 24x7
　　notification_options w,u,c,r
　　check_command check_nrpe!check_users
　　＃这里定义的check_nrpe需要在command.cfg里面定义
　　}
　　别忘了，先定义好zcg这台主机！！
　　

　　定义命令
　　define command {
　　command_name check_nrpe
　　command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
　　}
　　

　　重启nagios服务！
　　

　　

　　1利用飞信的机器人发信，使用139手机邮箱为比较友好的推荐的方法
　　2 实现自动添加nagios监控主机
　　3自行编写nagios插件
　　*******************************
　　nagios启动
　　方法
　　chkconfig --add nagios*添加启动项
　　chkconfignagios on *开机启动
　　service nagiso restart*启动
　　

　　******************************
　　关闭防火墙
　　iptables -L 查看防火墙
　　services iptables save
　　getenforce 查看selinux
　　setenfore 0 关闭selinux
　　services iptables save
　　iptables -L 查看防火墙
　　

　　apache下载地址：http://www.apache.org/dist/httpd/
　　================================
　　

　　相关软件包下载地址
　　nagios-.tar.gz
　　http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
　　nagios-plugins-.tar.gz
　　http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
　　nrpe-2.12.tar.gz
　　http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
　　NSClient++-Win32-.msi
　　http://nchc.dl.sourceforge.net/project/nscplus/nscplus/NSClient%2B%2B%200.3.6/NSClient%2B%2B--Win32.msi
　　一、          安装并配置飞信机器人（请参照飞信机器人在RHEL5下的安装和测试http://hi.baidu.com/turnipland/blog/item/9ddf96ef6f8a471dfdfa3cd3.html）
　　这里需要注意的是飞信机器人的安装目录下的所有文件和目录的权限问题，因为NAGIOS是利用nagios这个系统用户来调用飞信来发短信通知的，所以各文件都需要把所有者改为nagios所有组也改为nagios组，否则后面nagios运行之后有短信通知的时候系统会提示这样的错误Warning: Attempting to execute the command "/usr/local/fetion/sendsms.sh ""14:14:08":msg.baihe.com-Java(10.103.47.53) is CRITICAL."" resulted in a return code of 126. Make sure the script or binary you are trying to execute actually exists...
　　=======================================
　　nagios报错notify-by-email解决
　　调试命令：
　　/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
　　报错：
　　Error: Service notification command ‘notify-by-email’ specified for contact ‘zhuzhu’ is not defined anywhere!
　　Error: Host notification command ‘host-notify-by-email’ specified for contact ‘zhuzhu’ is not defined anywhere!
　　这两条报错证明在commands.cfg里没有定义这两条
　　

　　在commands.cfg里添加以下内容：
　　# ‘notify-host-by-email’ command definition
　　define command{
　　command_name host-notify-by-email
　　command_line /usr/bin/printf “%b” “***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState:$HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n” | /bin/mail -s “** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **” $CONTACTEMAIL$
　　}
　　

　　# ’service_notification_commands’ command definition
　　define command{
　　command_name notify-by-email
　　command_line /usr/bin/printf “%b” “***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$” | /bin/mail -s “** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **” $CONTACTEMAIL$
　　}
　　（注意：command_line 后面是一句话，粘贴复制的时候注意）
　　====================================
　　报错前提：
　　刚刚添加监控HTTP服务时，Nagios就报错了！
　　错误内容如下：
　　HTTP WARNING: HTTP/1.1 403 Forbidden - 5240 bytes in 0.001 second response time |time=0.001260s;;;0.000000 size=5240B;;;0
　　原因是nagios监控HTTP时，会监控到/var/www/html/下面的index.html文件，若没有就会提示错误，创建一个文件即可！
　　#touch /var/www/html/index.html
　　#service httpd restart
　　========================================
　　安装nrpe，编译的时候提示以下信息
　　checking for SSL headers... configure: error: Cannot find ssl headers
　　原因是缺少openssl-devel包，
　　yum -y install openssl-devel
　　=================
　　CHECK_NRPE: Error - Could not complete SSL handshake.
　　1、是否安装了经openssl，openssl_devel插件。
　　2、yum -y install xinetd
　　3、/usr/local/nagios/etc/nrpe.cfg 此配置文件是配置正确，注意空格：allowed_host=127.0.0.1,192.168.0.1 这是不对的，必须是allowed_host=127.0.0.1,192.168.0.1. 逗号之后不能有空格，192.168.0.1是本机（被监控端）的ip
　　4、移除 /etc/xinetd.d/nrpe 文件。
　　5、重启ninetd.d 服务。/etc/init.d/xinetd restart
　　6、重新启动nrpe。/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

　　

页: [1]

运维网's Archiver

nagios监控安装配置