测试环境:
监控主机:nagios+nagios插件+nrpe+网站平台 192.168.1.210
被监控机:nagios插件+nrpe 192.168.1.211
1、首先在监控主机上安装nrpe,nagios只能监控一些外部的信息,例如:ftp端口有没有开放,ssh端口有没有开放,ping值如何之类的,如果想监控linux主机一些本地的信息如:硬盘使用情况,机器负载等,必须是监控主机通过nagios调用被监控机的nrpe,被监控机的nrpe搜集信息,然后再返来给监控主机的nagios,这样的一个过程
监控主机安装nrpe:
tar zxvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
在被监控机上安装nagios插件和nrpe
tar zxvf nagios-plugins-1.4.15.tar.gz
cd nagios-plugins-1.4.15
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
这个文件暂时只修改了这两项
然后轮到nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/commands.cfg #nagios可调用的监控命令
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg #联系人配置
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg #监控时间配置
cfg_file=/usr/local/nagios/etc/objects/templates.cfg #模板配置
cfg_dir=/usr/local/nagios/etc/services #新添加,把需要添加的主机文件放进去,就不必在这里一行行添加
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg #新添加,主机组配置
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg #本地信息监迭
3、现在可以开始添加被监控机
我们上面定义了所有的主机文件都放在services目录下,那么我们在此目录下新建主机:
vi 192.168.1.211.cfg
内容如下:
define host{
use linux-server
host_name 192.168.1.211
alias 192.168.1.211
address 192.168.1.211
}
define service{
use generic-service
host_name 192.168.1.211
service_description check_ping
check_command check_ping!100.0,20%!200.0,50%
max_check_attempts 5
normal_check_interval 1
}
define service{
use generic-service
host_name 192.168.1.211
service_description check_ftp
check_command check_ftp!21
max_check_attempts 5
normal_check_interval 1
}
define service{
use generic-service
host_name 192.168.1.211
service_description check_ssh
check_command check_ssh
max_check_attempts 5
normal_check_interval 1
}
define service{
use generic-service
host_name 192.168.1.211
service_description check_http
check_command check_http
max_check_attempts 5
normal_check_interval 1
}
上面监控了ping值,ftp服务,ssh服务还有http服务,我拿一个例子来说明
define host{
use linux-server
host_name 192.168.1.211
alias 192.168.1.211
address 192.168.1.211
}
define service{
use generic-service
host_name 192.168.1.211
service_description check_ping
check_command check_ping!100.0,20%!200.0,50%
max_check_attempts 5
normal_check_interval 1
}
例如这一段,首先define host定义了这台被监控主机,它所用的模板是linux-server这个,那这个模板又是在哪里定义的呢,就是在刚才nagios里不是有一行模板配置信息吗,就是那个文件,我打开templates.cfg 文件并找到linux-server模板,这模板的信息是这样的:
define host{
name linux-server ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}