nagios监控系统搭建及配置
声明:本文档只概述nagios监控系统的搭建和配置,不涉及到任何原理性的东西,要看原理的朋友请参考别的文档,因为我对nagios理解的也不是很深,只是先搭建起来看看效果,然后再分析。我已搭建好的nagios监控系统环境:红帽4.8(64位)安装操作系统时所有软件包都已安装。软件环境:nagios-3.2.1、nagios-plugins-1.4.15、nrpe-2.12(可以和我的不一样)监控机:192.168.5.58(安装操作系统的时候是所有软件包都安装,apache用系统自带的)被监控机:192.168.3.64(随便取上图中的一台服务器)说明:监控机上需要部署nagios、nagios-plugins、nrpe(nrpe是监控cpu负载,进程数,磁盘空间使用率)。如果说你只想监控本机的ping、或者80端口什么的那就不需要安装nrpe插件。同理,如果需要监控被监控机的存活、80端口什么的也不需要安装nrpe插件,如果要监控被监控机的cpu负载、进程数、磁盘空间使用率就需要在被监控机上安装nrpe插件。监控机下载如下软件:wget http://prodownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.1.tar.gzwget http://prodownloads.sourceforge.net/sourceforge/nagios/nagios-plugins-1.4.15.tar.gzwget http://prodownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz监控机操作如下:安装nagios的时候,需要先创建一个系统用户:nagios,命令如下:groupadd nagiosuseradd –g nagios nagiosmkdir /usr/local/nagioschown –R nagios:nagios /usr/local/nagios安装nagios,解压编译安装tar –zxvf nagios-3.2.1.tar.gzcd nagios-3.2.1./configure –prefix=/usr/local/nagiosmake allmake installmake install-initmake install-commandmodemake install-config安装nagios的插件,解压编译安装tar –zxvf nagios-plugins-1.4.15.tar.gzcd nagios-plugins-1.4.15./configure –prefix=/usr/local/nagiosmakemake install修改apache的配置文件,在最后面加上如下内容:ScriptAlias /nagios/cgi-bin"/usr/local/nagios/sbin"<Directory "/usr/local/nagios/sbin">OptionsExecCGIAllowOverrideNoneOrderallow,denyAllow fromallAuthName "NagiosAccess"AuthTypeBasicAuthUserFile/usr/local/nagios/etc/htpasswd.usersRequirevalid-user</Directory>Alias /nagios "/usr/local/nagios/share"<Directory "/usr/local/nagios/share">OptionsNoneAllowOverrideNoneOrderallow,denyAllow fromallAuthName "NagiosAccess"AuthTypeBasicAuthUserFile/usr/local/nagios/etc/htpasswd.usersRequirevalid-user</Directory>修改下面这行注释去掉,修改如下:#ServerName new.host.name:80修改后:ServerName 192.168.5.58:80增加验证用户,使用如下命令:用户:admin,密码:abc#123/usr/bin/htpasswd –c /usr/local/nagios/etc/htpasswd.useradminNew password:输入密码Re-type new password:再输入一次密码Adding password for user admin至此nagios安装就完毕了。可以启动apache和nagios服务看看首页是什么样子,这样我们基本就测试通过了!安装超级简单,配置起来可比较麻烦!/etc/init.d/httpd start/etc/init.d/nagios start简单测试:http://192.168.5.58/nagios输入用户名admin和密码访问注意在启动apache的时候可能会报错,至于为什么报错我也不太懂,反正网页查看是没有任何问题的。Starting httpd: The ScriptAliasdirective in /etc/httpd/conf/httpd.conf at line 1024 will probably never matchbecause it overlaps an earlier ScriptAlias. The Alias directive in/etc/httpd/conf/httpd.conf at line 1035 will probably never match because itoverlaps an earlier Alias.网页查看没问题的话,下面开始配置nagios,现在我把监控机定义为一个组,组中成员也只有它,暂时只监控它的存活和80端口。nagios的主配置文件为nagios.cfg,它里面会定义调用其他的配置文件,比如监控命令,被监控的服务,联系人,联系组等。打开nagios.cfg主配置文件Vi /usr/local/nagios/etc/nagios.cfg找到如下内容并且修改和我下图一样:# You can specify individual object config files as shownbelow:cfg_file=/usr/local/nagios/etc/objects/commands.cfg====>命令配置文件cfg_file=/usr/local/nagios/etc/objects/contacts.cfg====>联系人配置文件cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg====>监视时段配置文件cfg_file=/usr/local/nagios/etc/objects/templates.cfg====>模版配置文件#cfg_file=/usr/local/nagios/etc/objects/localhost.cfgcfg_file=/usr/local/nagios/etc/objects/hosts.cfg====>监控主机配置文件cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg====>监控主机组配置文件cfg_file=/usr/local/nagios/etc/objects/services.cfg====>监控项目配置文件cfg_file=/usr/local/nagios/etc/objects/contactgroup.cfg====>联系人组配置文件# Definitions for monitoring the local (Linux)host#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg另外还有如下两个参数只需要修改一个,并说明参数的含义:check_external_commands=1表示允许在web界面下执行重启nagios、停止主机/服务检查等操作,more就是1不用修改command_check_interval=-1表示检查间隔时间,根据自己的情况定这个时间,这里我用的是5秒检查一次,所以我修改为如下:command_check_interval=5s修改cgi脚本控制文件cgi.cfg文件有个参数需要知道一下use_authentication=1表示控制相关的cgi脚本另外找的如下的行:authorized_for_system_information=nagiosadminauthorized_for_configuration_information=nagiosadminauthorized_for_system_commands=nagiosadminauthorized_for_all_services=nagiosadminauthorized_for_all_hosts=nagiosadminauthorized_for_all_service_commands=nagiosadminauthorized_for_all_host_commands=nagiosadmin在所有的行后面增加一个用户admin,这个用户就是上述创建的用户authorized_for_system_information=nagiosadmin,adminauthorized_for_configuration_information=nagiosadmin,adminauthorized_for_system_commands=nagiosadmin,adminauthorized_for_all_services=nagiosadmin,adminauthorized_for_all_hosts=nagiosadmin,adminauthorized_for_all_service_commands=nagiosadmin,adminauthorized_for_all_host_commands=nagiosadmin,admin下面我们开始新建上述nagios.cfg文件里面调用的那些配置文件吧!因为有些文件有,所以我备份一下!其实nagios配置很灵活的,一般位置对应上,格式正确就行。cd /usr/local/nagios/etc/objectsmv timeperiods.cfg timeperiods.cfg.bak定义监控时间段,创建文件内容如下:vi timeperiods.cfgdefine timeperiod{timeperiod_name 24x7alias 24 Hours A Day, 7 Days AWeeksunday00:00-24:00monday00:00-24:00tuesday00:00-24:00wednesday00:00-24:00thursday00:00-24:00friday00:00-24:00saturday00:00-24:00}定义了一个监控时间段,它的名称是24x7,监控的时间是每天全天24小时,注意这里不是*号,而是小写字母x。mv contacts.cfg contacts.cfg.bak定义联系人,创建文件内容如下:vi contacts.cfgdefine contact{contact_nameadmin=====>联系人名称aliassys admin=====>别名service_notification_period24x7host_notification_period24x7service_notification_options w,u,c,rhost_notification_optionsd,u,rservice_notification_commandsnotify-host-by-emailhost_notification_commandsnotify-host-by-emailemailvfast_zengzz@yahoo.cnpager13601298217}w是报警(warning),u是未知(unkown),c是严重(critical),r是恢复定义联系组,创建文件内容如下:Vi contactgroup.cfgdefinecontactgroup{contactgroup_namesagroupaliasSystem Administratormembersadmin}定义监控主机,创建文件内容如下:Vi hosts.cfgdefine host {host_name192.168.5.58alias192.168.5.58address127.0.0.1check_commandcheck-host-alivemax_check_attempts5check_period24x7contact_groupssagroupnotification_period24x7notification_optionsd,u,r}定义监控组,创建文件内容如下:Vi hostgroups.cfgdefine hostgroup{hostgroup_namenagios-serveraliasnagios-servermembers192.168.5.58}最重要的文件,定义监控项,创建文件内容如下:Vi services.cfgdefine service{host_name192.168.5.58service_descriptioncheck-host-alivecheck_commandcheck-host-alive====>监控主机存活项max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.5.58service_descriptioncheck-httpcheck_commandcheck_http========>监控主机80端口max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}这就基本就配置完了本机监控本机的存活、80端口两项了。下面我们检查一下nagios.cfg的配置文件是否有问题,执行如下命令:/usr/local/nagios/bin/nagios –v/usr/local/nagios/etc/nagios.cfg如果没有错误的话最后面会提示:Total Warnings: 0Total Errors:0重启nagios服务/etc/init.d/nagios restart网页登陆后点击host groups按钮查看应该会出现如下图,我的5个OK是我加了监控cpu负载、进程数和磁盘空间使用率的(如果没配置错误的话你的应该会出现2个pending状态,等待几分钟之后应该就是2个OK状态了):接下来我们配置本机监控本机的cpu负载、进程数和磁盘空间使用率,并且监控192.168.3.64这台被监控机上所有的项,正如上文说到要监控这三项指标的话,必须安装nrpe插件。所以本机和3.64都要安装nrpe插件。安装nrpe插件请参考如下链接:http://blog.chinaunix.net/uid-23916356-id-3062081.html记住本机因为已经安装了nagios-plugins了,所以只需安装nrpe就行。(安装过程略)被监控机需要安装nagios-plugins和nrpe两个插件。(安装过程略)因为要用到nrpe模版,所以commands.cfg文件增加如下内容:define command{command_namecheck_nrpecommand_line$USER1$/check_nrpe -H $HOSTADDRESS$ -c$ARG1$}上述很多配置文件都已建立,只需要修改即可!如我要把单独加进来的被监控机单独分一个组。修改hostgroups.cfg文件,增加如下内容:define hostgroup{hostgroup_nameceshi-hadoopaliasceshi-hadoopmembers192.168.3.64 =====>组成员,多个组员就以逗号隔开写}新加的被监控机添加到监控主机文件,修改hosts.cfg文件,增加如下内容:define host {host_name192.168.3.64alias192.168.3.64address192.168.3.64check_commandcheck-host-alivemax_check_attempts5check_period24x7contact_groupssagroupnotification_period24x7notification_optionsd,u,r}因为上述要求有新增的监控项,所以肯定要修改监控项文件。最后修改services.cfg文件,增加如下内容:define service{host_name192.168.5.58service_descriptioncheck-local-loadcheck_commandcheck_nrpe!check_load=====> cpu负载max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.5.58service_descriptioncheck-local-procscheck_commandcheck_nrpe!check_total_procs=====>进程数max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.5.58service_descriptioncheck-local-diskcheck_commandcheck_nrpe!check_df=====> 磁盘使用率max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64====>主机存活service_descriptioncheck-host-alivecheck_commandcheck-host-alivemax_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-local-diskcheck_commandcheck_nrpe!check_df======>磁盘使用率max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-httpcheck_commandcheck_http========>80端口max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-tcp-1099check_commandcheck_tcp!1099==========>1099端口max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-tcp-2222check_commandcheck_tcp!2222===========>2222端口max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-tcp-60030check_command check_tcp!60030==============>60030端口max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-tcp-50010check_commandcheck_tcp!50010===========>50010端口max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-local-loadcheck_commandcheck_nrpe!check_load=========>cpu负载max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}define service{host_name192.168.3.64service_descriptioncheck-total-procscheck_commandcheck_nrpe!check_total_procs=======>进程数max_check_attempts5normal_check_interval3retry_check_interval2check_period24x7notification_interval10notification_period24x7notification_optionsw,u,c,rcontact_groupssagroup}这样就配置完毕了。重启nagios服务,重启之前检查一下配置文件。/usr/local/nagios/bin/nagios –v/usr/local/nagios/etc/nagios.cfg没有任何问题的话,我们重新启动nagios重启完之后点击host groups按钮应该会出现2个组,每个组里面各监控一台机器,本机监控了5项指标,3.64监控了9项指标。 配置总结:假如你有新的机器要被监控,你们你需要修改监控主机、监控主机组、监控项三个配置文件。假如你的一台机器已经监控了,需要再监控一个端口什么的,那么你只需要修改监控项就OK了。总之呢?被监控机上不部署nagios插件之类的话,只可以监控它的存活、端口开放等。但是监控不了cpu负载、进程数和磁盘使用率等之类指标。最后说一下,我也是菜鸟,有问题的地方请联系QQ:316189480 开始配置邮件报警测试工作:你需要有一个合法的邮箱地址,当然可以是很多,例如谷歌、yahoo、163、126、139邮箱等。随便选取一个测试一下吧!我这里就拿监控机自带的sendmail测试吧!确定sendmail已经启动!# /etc/init.d/sendmail statussendmail (pid 5291 5282) is running...执行如下命令:echo “test” | mail 你的邮箱地址过几分钟确定你的邮箱能收到这封test内容的邮件然后修改contacts.cfg文件,修改后的内容如下:define contact{contact_namezengzhunzhunaliaszengzhunzhunservice_notification_period24x7host_notification_period24x7service_notification_optionsw,u,c,rhost_notification_optionsd,u,rservice_notification_commandsnotify-service-by-emailhost_notification_commandsnotify-host-by-emailemailvfast_zengzz@yahoo.cn,zengzhunzhun@gmail.com,zengzhun@126.com,13601298217@139.compager13601298217}修改contactgroups.cfg文件,修改后的内容如下:definecontactgroup{contactgroup_namesagroupaliasSystemAdministratormemberszengzhunzhun}然后重启nagios服务/etc/init.d/nagios restart接下来我们的邮箱应该就能收到报警了!可能会有一点延迟,因为我们使用的本机自带的sendmail服务器不是合法的邮件服务器!我这里这是简单的测试一下。如果收不到的话,请查看/var/log/maillog是否发送成功,如果遇到下列类似错误的话那么请修改nagios.cfg文件,将参数notification_timeout=30修改为120,时间单位是秒,修改后记得重新启动nagios。如果生产环境需要的话还得配置邮件服务器!这就另说了! Warning: Contact 'admin' service notification command'/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost:192.168.2.161\nState: UP\nAddress: 192.168.2.161\nInfo: PING OK - Packet loss =0%, RTA = 5.50 ms\n\nDate/Time: Thu Aug 30 16:44:46 CST 2012\n" | /bin/mail -s"** PROBLEM Host Alert: 192.168.2.161 is UP **" vfast_zengzz@yahoo.cn' timed outafter 30 seconds 开始配置短信报警众所周知移动推出的139邮箱是可以接收短信的,意思就是移动的邮箱接收到邮件时候同时也会给绑定的手机发一封邮件!所以我们只需要把你的contacts.cfg文件的邮箱改为你的移动139的邮箱就OK了!测试时可以发送短信通知的。写的真的很不错 很多女明星不红的原因是因为没有张开腿*^_^* 关羽五绺长髯,风度翩翩,手提青龙偃月刀,江湖人送绰号——刀郎。 生活***好玩,因为生活老***玩我! 人生不能像做菜、把所有的料都准备好才下锅! 帮帮顶顶!!
页:
[1]