nagios监控三部曲之——nagios的安装与配置(1)
最近公司需要上线监控系统,而且需要部署很多的监控,环境与设备也大都不一样,所以我就写了一份安装监控的技术文档,让我公司的运维来根据我的文档来进行监控的部署。我的系统是redhat5.4,关闭了iptables与selinux。
1、安装yum(如果本机有了yum,则可以不安装,跳过此步到第3步)
[*]# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.i386.rpm
[*]root@localhost yum.repos.d]# wget http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt
[*]# rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.i386.rpm
[*]root@localhost yum.repos.d]# rpm --import RPM-GPG-KEY.dag.txt
[*]# yum install yum-fastestmirror yum-presto
2、安装apache(如果本机默认安装了,那么可以跳过这一步,如果没有安装,则可以yum安装)
[*]# yum -y install httpd
安装nagios需要一些基础支持套件
[*]# yum -y install gd gd-devel glibc glibc-common gcc
3、配置apache来支持nagios
(1)建立nagios用户
[*]# useradd nagios
[*]# /usr/sbin/groupadd nagcmd添加nagcmd用户组,用以通过web页面提交外部控制命令
[*]# /usr/sbin/usermod -a -G nagcmd nagios将nagios用户加入nagcmd组
[*]# /usr/sbin/usermod -a -G nagcmd apache将apache用户加入nagcmd组
[*]# /usr/sbin/usermod -a -G apache nagios将nagios用户加入apache组
[*]# /usr/sbin/usermod -a -G nagios apache将apache用户加入nagios组
(2)修改apache运行用户和组。默认是daemon,需要把它改成nagios。这样它才能有权限访问我们安装的nagios目录,执行相关的cgi命令,如通过浏览器界面关闭nagios、停止某个故障对象发送报警信息等。(此步可以省略,因为我在部署nagios的时候,没有改变apache的用户与组,也没有出现问题)
(3)添加nagios访问目录(nagios 的安装路径/usr/local/nagios),同时使用http用户验证。把下面的内容追加到httpd.conf文件的末尾:
[*]ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
[*]
[*]Options ExecCGI
[*]AllowOverride None
[*]Order allow,deny
[*]Allow from all
[*]AuthName "Nagios Access"
[*]AuthType Basic
[*]AuthUserFile /usr/local/nagios/etc/htpasswd
[*]Require valid-user
[*]
[*]Alias /nagios /usr/local/nagios/share
[*]
[*]Options None
[*]AllowOverride None
[*]Order allow,deny
[*]Allow from all
[*]AuthName "Nagios Access"
[*]AuthType Basic
[*]AuthUserFile /usr/local/nagios/etc/htpasswd
[*]Require valid-user
[*]
4、安装nagios
[*]# tar zxvf nagios-3.3.1.tar.gz
[*]# ./configure --prefix=/usr/local/nagios -with-command-group=nagcmd
[*]# make all
[*]# make install
[*]# make install-init
[*]# make install-config
[*]# make install-commandmode
[*]# make install-webconf
5、安装nagios插件nagios-plugin
[*]#cd /tmp
[*]# tar zxvf nagios-plugins-1.4.15.tar.gz
[*]# ./configure--with-nagios-user=nagios --with-nagios-group=nagios
[*]# make
[*]# make install
6、配置nagios
[*]# cd /usr/local/
[*]# chown -R nagios:nagios nagios/
[*]# chown -R nagios:nagios nagios/*
[*]# cd nagios/etc/
[*]# vim nagios.cfg ###修改nagios.cfg配置文件,内容如下:
[*]cfg_file=/usr/local/nagios/etc/hosts.cfg #增加主机配置文件
[*]cfg_file=/usr/local/nagios/etc/hostgroups.cfg #增加主机组配置文件
[*]cfg_file=/usr/local/nagios/etc/contacts.cfg #增加联系人配置文件
[*]cfg_file=/usr/local/nagios/etc/contactgroups.cfg #增加联系人配置文件
[*]cfg_file=/usr/local/nagios/etc/services.cfg ##增加服务配置文件
[*]cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg #时间周期配置文件
[*]cfg_file=/usr/local/nagios/etc/objects/commands.cfg #命令配置文件
[*]修改cgi.cfg配置文件,修改内容如下:
[*]# vim cgi.cfg
[*]#如有多个用户,中间用逗号隔开
[*]authorized_for_system_information=nagios
[*]authorized_for_configuration_information= nagios
[*]authorized_for_system_commands= nagios
[*]authorized_for_all_services= nagios
[*]authorized_for_all_hosts= nagios
[*]authorized_for_all_service_commands= nagios
[*]authorized_for_all_host_commands= nagios
在这里指定的用户”nagios”可以通过浏览器操纵nagios服务的关闭、重启等各种操作
# sed -i 's/nagiosadmin/nagios/g' cgi.cfg ##或者用此命令修改
[*](1)、配置主机文件hosts.cfg
[*]define host{
[*]host_name web1## 主机名为web1,可以在hostname里查看
[*]alias Nagios Server##主机别名为Server
[*]address 192.168.10.223##主机的ip地址
[*]check_command check-host-alive ##检查使用的命令,需要在命令定
[*]义文件定义,默认是定义好的。
[*]check_interval 5##检测的时间间隔
[*]retry_interval 1 ##检测失败后重试的时间间隔
[*]max_check_attempts 5 ##最大重试次数
[*]check_period 24x7 ##检测的时段
[*]process_perf_data 0
[*]retain_nonstatus_information 0
[*]contact_groups admin ###联系组,就是设置邮件报警的组
[*]notification_interval 30 ##通知间隔
[*]notification_period 24x7##通知周期设置
[*]notification_options d,u,r ####定义什么状态时报警,定义报警状态中的w表示warning,u表示unknown,c表示critial,r表示recovery(即恢复后是否发送通知);报警选项一般生产环境下设置w,c,r即可
[*]}
[*](2)、配置主机组文件hostgroups.cfg
[*]define hostgroup {
[*]hostgroup_name Nagios-Example##定义主机组的名字
[*]alias Nagios Example ##定义主机组的别名
[*]members web1 ##主机组的成员,跟hosts.cfg里的hostname一致,否则出错
[*]}
[*](3)、配置联系人文件contacts.cfg
[*]define contact{
[*]contact_name nagiosadmin #联系名称
[*]alias Nagios Admin #联系别名
[*]service_notification_period 24x7 #服务监控时间为任何时候
[*]host_notification_period 24x7 #主机监控时间为任何时候
[*]service_notification_options w,u,c,r #服务监控的状态
[*]host_notification_options d,u,r #主机监控的状态
[*]service_notification_commands notify-service-by-email #邮件报警
[*]host_notification_commands notify-host-by-email #同上
[*]email denglei@ctfo.com #接收报警的邮箱
[*]}
[*](4)、配置联系组文件contactgroups.cfg
[*]define contactgroup{
[*]contactgroup_name admin #联系组的名字
[*]alias Nagios Administrators #联系组的别名
[*]members nagiosadmin #联系组里的成员,与contacts.cfg里的contact_name 保存一致
[*]
[*]}
[*](5)、配置服务文件 services.cfg
[*]define service {
[*]host_name web1 #与hosts.cfg里的host-name保持一致
[*]service_description check-host-alive #服务描述
[*]check_period 24x7 #服务描述
[*]max_check_attempts 4 #最大检测次数
[*]normal_check_interval 3 #检测的时间间隔
[*]retry_check_interval 2 #重复检测的时间间隔
[*]contact_groups admin #发生故障通知的联系人组
[*]notification_interval 10 #通知间隔
[*]notification_period 24x7 #通知的时间段
[*]notification_options w,u,c,r #定义什么状态时报警,定义报警状态中
[*]check_command check-host-alive #检测的命令
[*]}
[*]define service {
[*]host_name web1
[*]service_description PING
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notification_options w,u,c,r
[*]check_command check_ping!100.0,20%!500.0,60%
[*]}
[*]define service {
[*]host_name web1
[*]service_description Root Partition
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notification_options w,u,c,r
[*]check_command check_local_disk!20%!10%!/
[*]}
[*]define service {
[*]host_name web1
[*]service_description Current Users
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notification_options w,u,c,r
[*]check_command check_local_users!20!50
[*]}
[*]define service {
[*]host_name web1
[*]service_description Total Processes
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notification_options w,u,c,r
[*]check_command check_local_procs!250!400!RSZDT
[*]}
[*]define service {
[*]host_name web1
[*]service_description Current Load
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notification_options w,u,c,r
[*]check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
[*]}
[*]define service {
[*]host_name web1
[*]service_description Swap Usage
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notification_options w,u,c,r
[*]check_command check_local_swap!20!10
[*]}
[*]define service {
[*]host_name web1
[*]service_description SSH
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notifications_enabled 0
[*]notification_options w,u,c,r
[*]check_command check_ssh
[*]}
[*]define service {
[*]host_name web1
[*]service_description HTTP
[*]check_period 24x7
[*]max_check_attempts 4
[*]normal_check_interval 3
[*]retry_check_interval 2
[*]contact_groups admin
[*]notification_interval 10
[*]notification_period 24x7
[*]notifications_enabled 0
[*]notification_options w,u,c,r
[*]check_command check_http
[*]}
7、安装nrpe
[*]# cd /tmp/
[*]# tar zxvf nrpe-2.12.tar.gz
[*]# cd nrpe-2.12
[*]# ./configure --prefix=/usr/local/nrpe
[*]# make
[*]# make install
复制文件
[*]# cp /usr/local/nrpe/libexec/check_nrpe/usr/local/nagios/libexec
[*]# cp /usr/local/nagios/libexec/check_disk/usr/local/nrpe/libexec
[*]# cp /usr/local/nagios/libexec/check_load/usr/local/nrpe/libexec
[*]# cp /usr/local/nagios/libexec/check_ping/usr/local/nrpe/libexec
[*]# cp /usr/local/nagios/libexec/check_procs/usr/local/nrpe/libexec
配置nrpe
[*]# mkdir /usr/local/nrpe/etc
[*]# cp sample-config/nrpe.cfg /usr/local/nrpe/etc/
修改nrpe.cfg的配置问题,如果是服务端的话,可以不修改,如果是客户端的话,则修改下面:
allowed_hosts=127.0.0.1
可以在allowed_hosts里加入服务都的ip
[*]# /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
[*]# ps -ef|grep nrpe
[*]nagios 4465 10 21:02 ? 00:00:00 /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
[*]root 4467 128770 21:02 pts/2 00:00:00 grep nrpe
[*]# lsof -i:5666
[*]COMMANDPID USER FD TYPE DEVICE SIZE NODE NAME
[*]nrpe 4465 nagios 4uIPv481685 TCP *:5666 (LISTEN)
修改nagios与nrpe的所属用户与组
[*]# chown -R nagios:nagios /usr/local/nagios/*
[*]# chown -R nagios:nagios /usr/local/nrpe/*
8、启动nagios
先查看nagios的配置是否有问题
[*] # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[*]
[*]Nagios Core 3.3.1
[*]Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
[*]Copyright (c) 1999-2009 Ethan Galstad
[*]Last Modified: 07-25-2011
[*]License: GPL
[*]
[*]Website: http://www.nagios.org
[*]Reading configuration data...
[*] Read main config file okay...
[*]Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
[*]Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
[*]Processing object config file '/usr/local/nagios/etc/hosts.cfg'...
[*]Processing object config file '/usr/local/nagios/etc/hostgroups.cfg'...
[*]Processing object config file '/usr/local/nagios/etc/contacts.cfg'...
[*]Processing object config file '/usr/local/nagios/etc/contactgroups.cfg'...
[*]Processing object config file '/usr/local/nagios/etc/services.cfg'...
[*] Read object config files okay...
[*]Running pre-flight check on configuration data...
[*]
[*]Checking services...
[*] Checked 9 services.
[*]Checking hosts...
[*] Checked 1 hosts.
[*]Checking host groups...
[*] Checked 1 host groups.
[*]Checking service groups...
[*] Checked 0 service groups.
[*]Checking contacts...
[*] Checked 2 contacts.
[*]Checking contact groups...
[*] Checked 1 contact groups.
[*]Checking service escalations...
[*] Checked 0 service escalations.
[*]Checking service dependencies...
[*] Checked 0 service dependencies.
[*]Checking host escalations...
[*] Checked 0 host escalations.
[*]Checking host dependencies...
[*] Checked 0 host dependencies.
[*]Checking commands...
[*] Checked 24 commands.
[*]Checking time periods...
[*] Checked 5 time periods.
[*]Checking for circular paths between hosts...
[*]Checking for circular host and service dependencies...
[*]Checking global event handlers...
[*]Checking obsessive compulsive processor commands...
[*]Checking misc settings...
[*]Total Warnings: 0
[*]Total Errors: 0
[*]Things look okay - No serious problems were detected during the pre-flight check
没有问题,则启动nagios
[*] # chkconfig --add nagios 将nagios添加到服务中
[*]# chkconfig nagios on 设置服务为自启动
[*]# service nagios start 启动nagios
创建web验证用户
[*]# htpasswd -c /usr/local/nagios/etc/htpasswd nagios
[*]New password:
[*]Re-type new password:
[*]Adding password for user nagios
创建开机启动nrpe
[*]#echo "/usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d" >>/etc/rc.local
启动sendmail,接收报警
[*]#service sendmail start
之后你断掉httpd服务就能收到报警,如果出现了解决不了的问题,可以联系我。
或者直接浏览我的下一篇文章 “文章为什么nagios不能发生报警邮件
”,地址是http://dl528888.blog.运维网.com/2382721/763079
页:
[1]