keepalived 详解

23饿1 · 发表于 2017-1-19 08:58:48

什么是Keepalived呢，keepalived观其名可知，保持存活，在网络里面就是保持在线了，也就是所谓的高可用或热备，用来防止单点故障(单点故障是指一旦某一点出现故障就会导致整个系统架构的不可用)的发生，那说到keepalived时不得不说的一个协议就是VRRP协议，可以说这个协议就是keepalived实现的基础，那么首先我们来看看VRRP协议

VRRP术语：

VRRP虚拟路由(VRRP router)：

VRRP的优势：
冗余：可以使用多个路由器设备作为LAN客户端的默认网关，大大降低了默认网关成为单点故障的可能性；
负载共享：允许来自LAN客户端的流量由多个路由器设备所共享；
多VRRP组：在一个路由器物理接口上可配置多达255个VRRP组；
多IP地址：基于接口别名在同一个物理接口上配置多个IP地址，从而支持在同一个物理接口上接入多个子网；
抢占：在master故障时允许优先级更高的backup成为master；
通告协议：使用IANA所指定的组播地址224.0.0.18进行VRRP通告；
VRRP追踪：基于接口状态来改变其VRRP优先级来确定最佳的VRRP路由器成为master；

IP地址拥有者（IP Address Owner）：如果一个VRRP设备将虚拟路由器IP地址作为真实的接口地址，则该设备被称为IP地址拥有者。如果IP地址拥有者是可用的，通常它将成为Master。

keepalived是一个类似于layer3, 4 & 5交换机制的软件，也就是我们平时说的第3层、第4层和第5层交换。
Keepalived的作用是检测web服务器的状态，如果有一台web服务器死机，或工作出现故障，Keepalived将检测到，并将有故障的web服务器从系统中剔除，当web服务器工作正常后Keepalived自动将web服务器加入到服务器群中，这些工作全部自动完成，不需要人工干涉，需要人工做的只是修复故障的web服务器。

Layer3：Keepalived使用Layer3的方式工作式时，Keepalived会定期向服务器群中的服务器发送一个ICMP的数据包（既我们平时用的Ping程序）,如果发现某台服务的IP地址没有激活，Keepalived便报告这台服务器失效，并将它从服务器群中剔除，这种情况的典型例子是某台服务器被非法关机。Layer3的方式是以服务器的IP地址是否有效作为服务器工作正常与否的标准。在本文中将采用这种方式。
Layer4:如果您理解了Layer3的方式，Layer4就容易了。Layer4主要以TCP端口的状态来决定服务器工作正常与否。如web server的服务端口一般是80，如果Keepalived检测到80端口没有启动，则Keepalived将把这台服务器从服务器群中剔除。
Layer5：Layer5就是工作在具体的应用层了，比Layer3,Layer4要复杂一点，在网络上占用的带宽也要大一些。Keepalived将根据用户的设定检查服务器程序的运行是否正常，如果与用户的设定不相符，则Keepalived将把服务器从服务器群中剔除。

keepalived启动后会有三个进程
父进程：内存管理，子进程管理等等
子进程：VRRP子进程
子进程：healthchecker子进程

keepalived配置文件详解

global_defs {
notification_email {       #指定keepalived在发生切换时需要发送email到的对象，一行一个
   sysadmin@fire.loc
}
notification_email_from Alexandre.Cassen@firewall.loc #指定发件人
smtp_server localhost       #指定smtp服务器地址
smtp_connect_timeout 30    #指定smtp连接超时时间
router_id LVS_DEVEL          #运行keepalived机器的一个标识
}
vrrp_sync_group VG_1{          #监控多个网段的实例
group {
inside_network                #实例名
outside_network
}
notify_master /path/xx.sh       #指定当切换到master时，执行的脚本
netify_backup /path/xx.sh       #指定当切换到backup时，执行的脚本
notify_fault "path/xx.sh VG_1" #故障时执行的脚本
notify /path/xx.sh
smtp_alert                      #使用global_defs中提供的邮件地址和smtp服务器发送邮件通知
}
vrrp_instance inside_network {
state BACKUP                #指定那个为master，那个为backup，如果设置了nopreempt这个值

                              不起作用，主备考priority决定
interface eth0             #设置实例绑定的网卡
dont_track_primary          #忽略vrrp的interface错误（默认不设置）
track_interface{          #设置额外的监控，里面那个网卡出现问题都会切换
eth0
eth1
}
mcast_src_ip                #发送多播包的地址，如果不设置默认使用绑定网卡的primary ip
garp_master_delay          #在切换到master状态后，延迟进行gratuitous ARP请求
virtual_router_id 50       #VPID标记
priority 99                #定义优先级，数字越大，优先级越高,高优先级竞选为master
advert_int 1                #检查间隔，默认1秒
nopreempt                   #设置为不抢占注：这个配置只能设置在backup主机上，而且这个

                              主机优先级要比另外一台高
preempt_delay             #抢占延时，默认5分钟
debug #debug级别
authentication {          #设置认证
      auth_type PASS          #认证方式
      auth_pass 111111       #认证密码
}
virtual_ipaddress {       #设置vip
      192.168.202.200
}
}
virtual_server 192.168.202.200 23 {
delay_loop 6 #健康检查时间间隔
lb_algo rr #lvs调度算法rr|wrr|lc|wlc|lblc|sh|dh
lb_kind DR #负载均衡转发规则NAT|DR|RUN
persistence_timeout 5 #会话保持时间
protocol TCP             #使用的协议
persistence_granularity <NETMASK> #lvs会话保持粒度
virtualhost <string>       #检查的web服务器的虚拟主机（host：头）
sorry_server<IPADDR> <port> #备用机，所有realserver失效后启用
real_server 192.168.200.5 23 {
         weight 1          #默认为1,0为失效
         inhibit_on_failure  #在服务器健康检查失效时，将其设为0，而不是直接从ipvs中删除
         notify_up <string> | <quoted-string> #在检测到server up后执行脚本
         notify_down <string> | <quoted-string> #在检测到server down后执行脚本

TCP_CHECK {
         connect_timeout 3       #连接超时时间
         nb_get_retry 3          #重连次数
         delay_before_retry 3    #重连间隔时间
         connect_port 23          #健康检查的端口的端口
         bindto <ip>
      }
HTTP_GET | SSL_GET{
url{                                              #检查url，可以指定多个
      path /
      digest <string>                               #检查后的摘要信息
      status_code 200                               #检查的返回状态码
      }
connect_port <port>
bindto <IPADD>
connect_timeout 5
nb_get_retry 3
delay_before_retry 2
}

SMTP_CHECK{
host{
connect_ip <IP ADDRESS>
connect_port <port>                                     #默认检查25端口
bindto <IP ADDRESS>
      }
connect_timeout 5
retry 3
delay_before_retry 2
helo_name <string> | <quoted-string>                   #smtp helo请求命令参数，可选
}
MISC_CHECK{
misc_path <string> | <quoted-string>                   #外部脚本路径
misc_timeout                                           #脚本执行超时时间
misc_dynamic                                           #如设置该项，则退出状态码会用来动态调整服务器的权重，返回0 正常，不修改；返回1，检查失败，权重改为0；返回2-255，正常，权重设置为：返回状态码-2
}
}

配置keepalived为实现haproxy高可用的配置文件示例：
========================================================================================
! Configuration File for keepalived

global_defs {
notification_email {
      linuxedu@foxmail.com
      mageedu@126.com
}
notification_email_from kanotify@magedu.com
smtp_connect_timeout 3
smtp_server 127.0.0.1
router_id LVS_DEVEL
}

vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 1
weight 2
}

vrrp_script chk_mantaince_down {
script "[[ -f /etc/keepalived/down ]] && exit 1 || exit 0"
interval 1
weight -2
}

vrrp_instance VI_1 {
interface eth0
state MASTER  # BACKUP for slave routers
priority 101  # 100 for BACKUP
virtual_router_id 51
garp_master_delay 1

authentication {
      auth_type PASS
      auth_pass password
}
track_interface {
   eth0
}
virtual_ipaddress {
      172.16.100.1/16 dev eth0 label eth0:0
}
track_script {
      chk_haproxy
      chk_mantaince_down
}

notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
========================================================================================

注意：
1、上面的state为当前节点的起始状态，通常在master/slave的双节点模型中，其一个默认为MASTER，而别一个默认为BACKUP。
2、priority为当关节点在当前虚拟路由器中的优先级，master的优先级应该大于slave的；

1、如何在状态转换时进行通知？

notify_master ""
notify_backup
notify_fault

vrrp_sync_group {

}中定义，也可以在

vrrp_instance {

}中定义

通过man keepalived命令可以查看通知脚本定义的两种方法
第一种
# to MASTER transition
notify_master /path/to_master.sh
# to BACKUP transition
notify_backup /path/to_backup.sh
# FAULT transition
notify_fault "/path/fault.sh VG_1"

第二种
#arguments
# $1 ="GROUP"|"INSTANCE"
# $2 = name of group or instance
# $3 = target state of transition
# ("MASTER"|"BACKUP"|"FAULT")
notify /path/notify.sh

MASTER:
#!/bin/bash
#
vip=172.16.100.100
contact='root@localhost'
thisip=`ifconfig eth0 |awk '/inet addr:/{print $2}' |awk -F: '{print $2}'`

notify() {
mailbody="vrrp transaction, $vip floated to $thisip."
subject="$thisip is to be $vip master"
echo $mailbody | mail -s $subject $contact
}

notify

2、如何配置ipvs?
virutal server
realserver
      health check

3、如何对某特定服务做高可用？
一：要提供监控服务脚本
二：在vrrp实例中追踪服务

1.监控服务
   vrrp_script {

   }

2.在vrrp实例中追踪服务
   track_script {

   }

nginx

4、如何实现基于多虚拟路由的master/master模型？
一：定义两个vrrp_instance就可以了
二：DNS配置两条A记录

下面是一个notify.sh脚本的简单示例：
#!/bin/bash
# Author: MageEdu <linuxedu@foxmail.com>
# description: An example of notify script
#

vip=172.16.100.1
contact='root@localhost'

notify() {
mailsubject="`hostname` to be $1: $vip floating"
mailbody="`date '+%F %H:%M:%S'`: vrrp transition, `hostname` changed to be $1"
echo $mailbody | mail -s "$mailsubject" $contact
}

case "$1" in
master)
      notify master
      /etc/rc.d/init.d/haproxy start
      exit 0
;;
backup)
      notify backup
      /etc/rc.d/init.d/haproxy stop
      exit 0
;;
fault)
      notify fault
      /etc/rc.d/init.d/haproxy stop
      exit 0
;;
*)
      echo 'Usage: `basename $0` {master|backup|fault}'
      exit 1
;;
esac
========================================================================================
以上配置keepalived会有问题，监控的服务不管是停止还是重启，还是有问题！

直接在监控脚本中杀死keepalived进程！让VIP转到从节点。
例如haproxy服务监控脚本：
#!/bin/bash
#A = `ps -C haproxy --no-header |wc -l`
if [[ `ps -C haproxy --no-header |wc -l` -eq 0 ]];
then
echo "haproxy not runing,attempt to start up."
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg
sleep 3
if [[ `ps -C haproxy --no-header |wc -l` -eq 0 ]];
then
/etc/init.d/keepalived stop
echo "haproxy start failure,stop keepalived"
else
echo "haproxy started success"
fi
fi
========================================================================================

案例：

当master down了，backup接管了，master再次起来，不会再成为master。

否则master恢复了再接管的话。如果出现问题了，则切换两次对网站业务来说并不好。

解决方法是:

state 都设置为Backup,在优先级高的那台设置参数nopreempt.

keepalived.conf中的man有说明

# VRRP will normally preempt a lower priority
# machine when a higher priority machine comes
# online. "nopreempt" allows the lower priority
# machine to maintain the master role, even when
# a higher priority machine comes back online.
# NOTE: For this to work, the initial state of this
# entry must be BACKUP.
nopreempt

账号		自动登录	找回密码
密码			立即注册

VMware vcenter+vSphere 6.5 U2共享

【跟谁学】韩宇极简英语课-技术人员不得不

用Zabbix通过JMX方式监控weblogic

winhex数据恢复教程（非常巨大，内容丰富）

Symantec Backup Exec 2015 2016/2012 BE20

NetScaler VPX部署之：NetScaler Gateway调

zabbix3.4.1安装部署+微信推送信息+大屏显

[经验分享] keepalived 详解

扫码加入运维网微信交流群