542505989 发表于 2015-11-23 10:21:47

云监控 Nagios 安装步骤

前言
最近在研究云监控的相关工具,之前写过Ganglia的安装步骤,这回来记录下Nagios的安装步骤。

本文不讲解相关原理,若想了解请参考其他资料.

本文目的: 即使之前未触过nagios,也能按照文中步骤搭建自己的nagios监控集群.
@Author  duangr 
@Website http://my.oschina.net/duangr/blog/183160
1. Nagios简介
Nagios是一个可运行在Linux/Unix平台之上的开源监视系统,可以用来监视系统运行状态和网络信息。Nagios可以监视所指定的本地或远程主机以及服务,同时提供异常通知功能。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。


2. 相关环境
Host NameIPOS
Archduangr-1192.168.56.10CentOS 6.4x86_64duangr-2192.168.56.11
CentOS 6.4
x86_64
duangr-3192.168.56.12
CentOS 6.4
x86_64
3. 部署规划
项值监控服务主节点(Master)
duangr-1
被监控从节点(Slave)duangr-2, duangr-3Nagios主节点需要安装:

[*]nagios
[*]nagios-plugin
[*]nrpe
[*]php
[*]apache

Nagios从节点需要安装:

[*]nagios-plugin

[*]nrpe
安装路径规划
项值nagios安装路径
/usr/local/nagios
php安装路径/usr/local/php
apache安装路径/usr/local/apache2
4. 代码获取

[*]nagios-4.0.2.tar.gz
[*]nagios-plugins-1.5.tar.gz

[*]nrpe-2.15.tar.gz

[*]httpd-2.2.23.tar.gz

[*]php-5.4.10.tar.gz

5. 前提依赖

5.1 主机环境检查(全部节点)
2gcc-4.4.7-3.el6.x86_644glibc-common-2.14.1-6.x86_646package gd-devel is not installed8openssl-devel-1.0.0-27.el6.x86_64
若有缺失,请先安装. 可通过如下几个镜像网站下载相关安装包:

[*]http://rpm.pbone.net/
[*]http://mirrors.163.com/centos/6.4/os/x86_64/Packages/
[*]http://mirrors.sohu.com/centos/6.4/os/x86_64/Packages/
安装后再次检查如下:

2gcc-4.4.7-3.el6.x86_644glibc-common-2.14.1-6.x86_646gd-devel-2.0.35-11.el6.x86_648openssl-devel-1.0.0-27.el6.x86_64
6. 编译安装

6.1 创建用户nagios(全部节点)
2passwd nagios   (密码自定义)
6.2 安装nagios主程序(主节点安装)

2cd nagios-4.0.24make all1chkconfig --add nagios 3chkconfig --level 35 nagios on5nagios          0:关闭  1:关闭  2:关闭  3:启用  4:关闭  5:启用  6:关闭
6.3 安装nagios插件(全部节点安装)

2cd nagios-plugins-1.54make && make install
如果出现mysql相关的编译错误,是mysql的默认安装路径被修改导致的,调整with-mysql后重新make

2make && make install
6.4 安装NRPE(全部节点安装)

2cd nrpe-2.154make all1make install-daemon && make install-daemon-config && make install-xinetd
6.4.1 被监控节点配置

如果是被监控节点,需要配置NRPE已守护进程运行(通过xinetd来运行)

1、更改/etc/xinetd.d/nrpe文件,设置允许nagios主节点服务器连接

2only_from       = 127.0.0.1 192.168.56.10
2、在/etc/services结尾增加:

1vi /usr/local/nagios/etc/nrpe.cfg1service xinetd restart
5、验证nrpe是否监听

1/usr/local/nagios/libexec/check_nrpe -H localhost1/usr/local/nagios/libexec/check_nrpe -H 192.168.56.113/usr/local/nagios/libexec/check_nrpe -H 192.168.56.121tar -zxf httpd-2.2.23.tar.gz3./configure --prefix=/usr/local/apache21cd /export/home/tools/soft/php3cd /php-5.4.105make  && make install
6.7 使用apache 发布PHP的WEB
vi /usr/local/apache2/conf/httpd.conf

02Listen 800406    AddType application/x-httpd-php .php08....10ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"12     AuthType Basic14     AllowOverride None16     Allow from all18     AuthUserFile /usr/local/nagios/etc/htpasswd202224     Options None26     Order allow,deny28     AuthName "nagios Access"30     Require valid-user1/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd admin
启动apache

1# su - nagios1command=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$3command=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$5command=/usr/local/nagios/libexec/check_procs  $ARG1$1service xinetd restart
7.1.3 校验配置
检查监控命令配置是否ok

2/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load   -a 15,10,5 30,25,204/usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT1default_user_name=admin3authorized_for_configuration_information=nagiosadmin,admin5authorized_for_all_services=nagiosadmin,admin7authorized_for_all_service_commands=nagiosadmin,admin1#cfg_file=/export/home/nagios/etc/objects/localhost.cfg      (注释掉)1cd /usr/local/nagios/etc3cd servers
7.2.3 定义监控的主机组

声明一个监控的主机组,将主机环境中提到的三台主机全部加入监控

vi /export/home/nagios/etc/servers/group.cfg

新文件,内容如下:

2   hostgroup_name      duangr-server4   members             duangr-1,duangr-2,duangr-301define host{03       host_name                    duangr-105       address                      192.168.56.1007 09       use                             local-service11       service_description             Host Alive13       }15       use                             local-service17       service_description             Users19       }21       use                             local-service23       service_description             CPU25       }27       use                             local-service29       service_description             Disk Root31       }33       use                             local-service35       service_description             Disk Home37       }39       use                             local-service41       service_description             Zombie Procs43       }45       use                             local-service47       service_description             Total Procs49       }51       use                             local-service53       service_description             Swap Usage55       }
说明下,由于是此主机也是监控服务主节点所在主机,因此可以使用check_local_* 的相关命令来进行监控.
这个文件中已经将常用的监控项配置进去.
7.2.4.2 远程主机监控配置
再定义远程主机duangr-2和duangr-3

定义远程主机的监控之前,需要先定义check_nrpe命令

vi /usr/local/nagios/etc/objects/commands.cfg
在文件的最后面添加如下内容:
2define command{4       command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$6define command{8       command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$001define host{003       host_name               duangr-2005       address                 192.168.56.11007 009       use                             local-service011       service_description             Host Alive013       }015       use                             local-service017       service_description             Users019       }021       use                             local-service023       service_description             CPU025       }027       use                             local-service029       service_description             Disk Root031       }033       use                             local-service035       service_description             Disk /export/home037       }039      use                             local-service041      service_description             Procs Zombie043      }045      use                             local-service047      service_description             Procs Total049      }051       use                             local-service053       service_description             Swap Usage055       }057;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;059;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;061define service{063       host_name                       duangr-2065       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Ccrond"067;; 监控zookeeper进程069       use                             local-service071       service_description             PS: QuorumPeerMain073       }075define service{077       host_name                       duangr-2079       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.supervisor"081;; 监控storm的主节点进程083       use                             local-service085       service_description             PS: nimbus087       }089define service{091       host_name                       duangr-2093       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -ametamorphosis-server-w"095;; 监控Redis进程097       use                             local-service099       service_description             PS: redis-server101       }103define service{105       host_name                       duangr-2107       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.NameNode"109;; 监控hadoop主节点SecondaryNameNode进程111       use                             local-service113       service_description             PS: SecondaryNameNode115       }117define service{119       host_name                       duangr-2121       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.resourcemanager.ResourceManager"123;; 监控hadoop从节点DataNode进程125       use                             local-service127       service_description             PS: DataNode129       }131define service{133       host_name                       duangr-2135       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.nodemanager.NodeManager"1define contact{3       use                             generic-contact         ; Inherit default values from generic-contact template (defined above)5       email                           yourname@domain.com 7       }
除了配置监控邮件的接收人外,还要确保:

[*]本主机与邮件服务器互通
[*]本主机SendMail可以使用外部SMTP服务发送邮件
7.2.4.4 校验配置
1/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios已经是一个服务,也可以执行如下操作:
页: [1]
查看完整版本: 云监控 Nagios 安装步骤