nagios

时间：2017-01-12 16:15:34 阅读：311 评论：0 收藏：0 [点我收藏+]

服务端操作

Centos6默认yum源里没有nagios相关的rpm包，但是我们可以安装一个epel的扩展源，cacti时已安装过，这里就不再安装了

1、安装

[root@wy ~]# yum install -y nagios nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

解释说明：

且需要LAMP环境的支持，nagios不需要mysql

2、设置登录nagios后台的用户和密码

[root@wy ~]# htpasswd -c /etc/nagios/passwd nagiosadmin

解释说明：

默认没有密码，通过htpasswd -c创建nagios后台用户名密码；

3、检测配置文件

[root@wy ~]# nagios -v /etc/nagios/nagios.cfg

Total Warnings: 0

Total Errors: 0

4、启动服务

[root@wy ~]# service httpd restart; service nagios start

解释说明：

配置nagios后会在httpd的配置文件中生成nagios.conf，则需要重启httpd加载；然后开启nagios服务；

5、网页访问

技术分享

解释说明：

Hosts 都有哪些机器

Services 每台机器所监控的项目

Current Load 负载

Current Users

HTTP

PING

Root Partition 根分区

SSH

Swap Usage

Total Processes

状态：绿色灰色红色黄色

nagios的优势在于只是显示一个状态结果

客户端操作

1、安装

[root@y2 ~]# yum install -y epel-release

[root@y2 ~]# yum install -y nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

2、修改配置文件

[root@y2 ~]# vim /etc/nagios/nrpe.cfg

allowed_hosts=192.168.219.129

dont_blame_nrpe=1

解释说明：

allowed_hosts配置nagios服务器ip（允许哪台机器过来连接本机）；若多个ip的可以用逗号隔开

dont_blame_nrpe 1为允许 0为不允许

配置文件nrpe.cfg

核心组件nrpe（实现客户端与服务端通信去执行一些命令）

3、启动服务

[root@y2 ~]# /etc/init.d/nrpe start

4、监控中心添加主机（服务端手动添加）

nagios需要手动配置，添加监控服务

[root@wy ~]# cd /etc/nagios/conf.d/

[root@wy conf.d]# vim 192.168.219.128.cfg

define host{

use linux-server

host_name 192.168.219.128

alias 219.128

address 192.168.219.128

}

解释说明：

;为注释

host_name 可以随便写

alias 别名

address 被监控主机的ip

具体的监控项目

define service{

use local-service

host_name 192.168.219.128

service_description check_ping

check_command check_ping!100.0,20%!200.0,50%

max_check_attempts 5

normal_check_interval 1

}

define service{

use local-service

host_name 192.168.219.128

service_description check_ssh

check_command check_ssh

max_check_attempts 5

normal_check_interval 1

notification_interval 60

}

define service{

use local-service

host_name 192.168.219.128

service_description check_http

check_command check_http

max_check_attempts 5

normal_check_interval 1

}

解释说明：

ping http ssh 这三种服务不需要登录客户端就可以直接检测http、ssh端口，没有借助nrpe

generic-service 相当于一种类型

service_description 描述、备注

max_check_attempts 最好尝试几次（尝试失败，会多测试几次）

normal_check_interval 重新检测的间隔时间，单位分钟，默认3分钟

notification_interval 在服务出现异常后，故障一直没有解决，nagios再次对使用者发出通知的时间。单位是分钟。如果你认为，所有的事件只需要一次通知就够了，可以把这里的选项设为0

5、编辑配置文件后检查

[root@wy conf.d]# nagios -v /etc/nagios/nagios.cfg

Total Warnings: 0

Total Errors: 0

6、重启服务

[root@wy conf.d]# /etc/init.d/nagios restart

7、查看结果

技术分享

解释说明：

第一个拒绝连接，说明客户端的80端口的服务没启动，这时可以把nginx启动起来

[root@y2 ~]# /etc/init.d/nginx start

过一会儿再来查看

技术分享

背景：

第一种服务，ping、telnet，不用登录到另外的机器上去，就可以获得你的窗口。

另外一种服务，若想知道你的负载，那我必须登录你的机器上去才可以知道，若想看你的磁盘容量，看内存使用情况，必须得登录到对方的机器上去，才能够获得它的数据状态，这时候需要借助于nrpe这个服务了。

技术分享

添加nrpe通信服务：

服务端操作

1、添加nrpe命令格式

[root@wy conf.d]# vim /etc/nagios/objects/commands.cfg

define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

解释说明：

check_nrpe 定义的这个就是要与nrpe通信的

2、添加监控项目

[root@wy conf.d]# vim 192.168.219.128.cfg

define service{

use generic-service

host_name 192.168.219.128

service_description check_load

check_command check_nrpe!check_load

max_check_attempts 5

normal_check_interval 1

}

define service{

use generic-service

host_name 192.168.219.128

service_description check_disk_sda1

check_command check_nrpe!check_sda1

max_check_attempts 5

normal_check_interval 1

}

3、检查配置文件

[root@wy conf.d]# nagios -v /etc/nagios/nagios.cfg

Total Warnings: 0

Total Errors: 0

4、重启nagios服务

[root@wy conf.d]# /etc/init.d/nagios restart

客户端操作

1、定义check_load、check_sda1

[root@y2 ~]# vim /etc/nagios/nrpe.cfg

command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

command[check_sda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1

解释说明：

/usr/lib64/nagios/plugins/check_load

/usr/lib64/nagios/plugins/check_disk 均为二进制文件脚本

可以执行脚本查看

[root@y2 ~]# /usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

OK - load average: 0.01, 0.01, 0.00|load1=0.010;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0;

[root@y2 ~]# /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1

DISK OK - free space: /boot 425 MB (92% inode=99%);| /boot=33MB;387;435;0;484

2、重启nrpe服务

[root@y2 ~]# /etc/init.d/nrpe restart

网页端

查看添加的监控项目

技术分享

查看日志

[root@wy conf.d]# cat /var/log/nagios/nagios.log

参考资料：

http://www.cnblogs.com/kaituorensheng/p/4682565.html

配置告警：

服务端操作

1、编辑contacts.cfg

[root@wy conf.d]# vim /etc/nagios/objects/contacts.cfg

define contact{

contact_name 123

use generic-contact

alias aaa

email 1305198953@qq.com

}

define contact{

contact_name 456

use generic-contact

alias bbb

email 1305198953@qq.com

}

define contactgroup{

contactgroup_name common

alias common

members 123,456

}

2、在需要告警的服务里添加contactgroup，比如监控check_load

[root@wy conf.d]# vim 192.168.219.128.cfg

define service{

use generic-service

host_name 192.168.219.128

service_description check_load

check_command check_nrpe!check_load

max_check_attempts 5

normal_check_interval 1

notifications_enabled 1

notification_period 24x7

notification_options c,r

contact_groups common

}

解释说明：

notifications_enabled 是否开启提醒功能，1为开启，0为禁用

notification_period 发送提醒的时间段

notification_options w,u,c,r这是service的状态；w为warning，u为unknown，c为critical，r为recove(恢复了)；类似的还有一个host对应的状态：d,u,r d为down，u为unreachable，r为状态恢复OK，需要加入到host的定义配置里。

contact_groups common 不加，它就不知道跟谁发邮件了

3、检查语法

[root@wy conf.d]# nagios -v /etc/nagios/nagios.cfg

4、重启服务

[root@wy conf.d]# /etc/init.d/nagios restart

5、安装sendmail

[root@wy conf.d]# yum install -y sendmail

6、启动邮件服务

[root@wy conf.d]# /etc/init.d/sendmail start

本文出自 “linux” 博客，转载请与作者联系！

nagios

原文：http://warm51fun.blog.51cto.com/3884274/1891369

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)