Heartbeat实现Nginx高可用性(style 1.x)

踏雪寻梅 发表于 2016-12-28 06:54:21

一、准备工作

1. 系统：两台CentOS 5.4虚拟机
2. Hostname：HA1，HA2
3. IP地址：HA1 eth0：192.168.2.10 eth1：192.168.10.1
HA2 eth0：192.168.2.20 eth1：192.168.10.2
4. VIP：192.168.2.100 （Failover转移用的IP）

二、安装

1. Nginx编译安装
tar xzvf pcre-7.9.tar.gz
cd pcre-7.9
./configure
make
make install
cd ..

tar xzvf nginx-0.7.63.tar.gz
cd nginx-0.7.63
./configure –user=nobody –group=nobody –prefix=/usr/local/nginx –with-http_stub_status_module –with-http_ssl_module
make
make install

Nginx具体配置略。

2. Heartbeat编译安装

tar xzvf libnet-1.1.2.1.tar.gz
cd libnet
./configure
make
make install
cd ..

创建用户和用户组

heartbeat需要haclient用户组和hacluster用户两个节点做同样的操作，并保证haclient和hacluster的ID一样。

groupadd -g 500 haclient

useradd -u 500 -g haclient hacluster

tar jxvf STABLE-2.1.4.tar.bz2
cd Heartbeat-STABLE-2-1-STABLE-2.1.4/
./ConfigureMe configure
make
make install
# 拷贝配置文件到相应目录
cp doc/ha.cf /etc/ha.d/
cp doc/haresources /etc/ha.d/
cp doc/authkeys /etc/ha.d/
cd !$ # 跳转到/etc/ha.d/目录

三、配置Heartbeat

在/etc/ha.d/目录下进行配置：
1. vi authkeys # 节点认证方式，这里使用第一种crc
auth 1
1 crc
＃修改authkeys权限为600
chmode 600 authkeys

2. 编辑/etc/ha.d/ha.cf：
# cat ha.cf |sed ‘/^#/d’
# 开启HA的debug日志，建议调试完后关闭此日志
debugfile /var/log/ha-debug
# 开启HA日志
logfile /var/log/ha-log
# 设置日志打印级别
logfacility local0
# 多长时间建材一次心跳
keepalive 2
# 连续多长时间检测失败示对方挂掉，单位秒
deadtime 30
# 连续多长时间检测失败开始警告提示，单位秒
warntime 10
# 为服务重启预留一段时间，在这段时间不进行心跳检测
initdead 120
# 默认端口是UDP 694，我改为了695，如果在局域网还有人在玩Heartbeat，并且他用广播，你最好改个端口
# 否则可能会导致认证失败
udpport 695
# 使用单播通信，在HA2上修改为ucast eth1 192.168.10.1
ucast eth1 192.168.10.2
# 主节点恢复正常后是否再切换回来
auto_failback on
# 设置看门狗
# Watchdog在实现上可以是硬件电路也可以是软件定时器，能够在系统出现故障时自动重新启动系统。
# 在Linux 内核下,
watchdog的基本工作原理是：当watchdog启动后(即/dev/watchdog
设备被打开后)，
# 如果在某一设定的时间间隔内/dev/watchdog没有被执行写操作,
# 硬件watchdog电路或软件定时器就会重新启动系统。
watchdog /dev/watchdog
# 节点列表，主节点在前，不要写反了
node HA1
node HA2

3. # cat haresources

# 每一行代表一个资源组,资源组启动顺序是从左往右，关闭的顺序是从右往左。
＃一个资源组里面不同资源之间以空格分隔，不同的资源组之间没有必然关系
# 资源组的第一列是我们在ha.cf配置文件中列出的节点之一，而且应该是准备作为主节点的那一个节点。
# 每个资源都是一个脚本，可以放在/etc/init.d目录下面，也可以在/usr/local/etc/ha.d/resource.d目录下。
# 这些脚本必须要支持start和stop参数。
# 脚本的参数通过::来分隔。
# 主节点 VIP    资源名
HA1 192.168.2.100 nginxd

4. 编写nginxd资源脚本，放到/etc/rc.d/init.d/和/etc/ha.d/resource.d/下

#!/bin/sh

# source function library
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0

RETVAL=0
prog="nginx"

nginxDir=/usr/local/nginx
nginxd=$nginxDir/sbin/nginx
nginxConf=$nginxDir/conf/nginx.conf
nginxPid=$nginxDir/nginx.pid

nginx_check()
{
if [[ -e $nginxPid ]]; then
   ps aux |grep -v grep |grep -q nginx
   if (( $? == 0 )); then
         echo "$prog already running..."
         exit 1
   else
         rm -rf $nginxPid &> /dev/null
   fi
fi
}

start()
{
nginx_check
if (( $? != 0 )); then
   true
else
   echo -n $"Starting $prog:"
   daemon $nginxd -c $nginxConf
   RETVAL=$?
   echo
   [ $RETVAL = 0 ] && touch /var/lock/subsys/nginx
   return $RETVAL
fi
}

stop()
{
echo -n $"Stopping $prog:"
killproc $nginxd
RETVAL=$?
echo
[ $RETVAL = 0 ] && rm -f /var/lock/subsys/nginx $nginxPid
}

reload()
{
echo -n $"Reloading $prog:"
killproc $nginxd -HUP
RETVAL=$?
echo
}

case "$1" in
   start)
            start
            ;;
   stop)
            stop
            ;;
   restart)
            stop
            start
            ;;
   reload)
            reload
            ;;
   status)
            status $prog
            RETVAL=$?
            ;;
   *)
            echo $"Usage: $0 {start|stop|restart|reload|status}"
            RETVAL=1
esac
exit $RETVAL

设置hosts
# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1    vpc localhost.localdomain localhost
::1    localhost6.localdomain6 localhost6
192.168.10.1 HA1
192.168.10.2 HA2

注：在HA1和HA2上进行二、三步（安装、配置heartbeat）操作

6. 启动heartbeat
注意：主服务器和备份服务器的时间同步，如果相差太多heartbeat可能发生故障。

service heartbeat restart
查看heartbeat的日志启动信息（日志对于排错很有帮助）
tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:41:27 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2009/11/07_19:41:27 info: heartbeat: version 2.1.4
heartbeat: 2009/11/07_19:41:27 info: Heartbeat generation: 1257517561
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: bound send socket to device: eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: bound receive socket to device: eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: started on port 695 interface eth1 to 192.168.10.2
heartbeat: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_19:41:27 notice: Using watchdog device: /dev/watchdog
heartbeat: 2009/11/07_19:41:27 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2009/11/07_19:41:27 info: Local status now set to: ‘up’
heartbeat: 2009/11/07_19:41:29 info: Link ha2:eth1 up.
heartbeat: 2009/11/07_19:41:29 info: Status update for node ha2: status up
harc: 2009/11/07_19:41:29 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:41:30 info: Comm_now_up(): updating status to active
heartbeat: 2009/11/07_19:41:30 info: Local status now set to: ‘active’
heartbeat: 2009/11/07_19:41:30 info: Status update for node ha2: status active
harc: 2009/11/07_19:41:30 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:41:45 info: local resource transition completed.
heartbeat: 2009/11/07_19:41:45 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr: 2009/11/07_19:41:45 INFO:Resource is stopped
heartbeat: 2009/11/07_19:41:45 info: Local Resource acquisition completed.
harc: 2009/11/07_19:41:45 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp: 2009/11/07_19:41:45 received ip-request-resp 192.168.2.100 OK yes
ResourceManager: 2009/11/07_19:41:45 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr: 2009/11/07_19:41:45 INFO:Resource is stopped
ResourceManager: 2009/11/07_19:41:45 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr: 2009/11/07_19:41:46 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr: 2009/11/07_19:41:46 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr: 2009/11/07_19:41:46 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr: 2009/11/07_19:41:46 INFO:Success
heartbeat: 2009/11/07_19:41:46 info: remote resource transition completed.

查看网卡配置情况，VIP已配置到HA1上。
eth0:0 Link encap:EthernetHWaddr 00:0C:29:35:6F:D0
inet addr:192.168.2.100Bcast:192.168.2.255Mask:255.255.255.0
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
Interrupt:67 Base address:0×2000
查看nginx已经启动。

如果看到下面日志，可能是同网段中有人在UDP 694端口运行广播的heartbeat，换个端口试试可能能解决问题。

heartbeat: 2009/11/07_00:18:53 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2009/11/07_00:18:53 info: heartbeat: version 2.1.4
heartbeat: 2009/11/07_00:18:53 info: Heartbeat generation: 1257517538
heartbeat: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 – Status: 1
heartbeat: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_00:18:53 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2009/11/07_00:18:53 info: Local status now set to: ‘up’
heartbeat: 2009/11/07_00:18:55 ERROR: process_status_message: bad node in message
heartbeat: 2009/11/07_00:18:55 ERROR: MSG: Dumping message with 12 fields
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG : [(1)srcuuid=0x9696e70(36 27)]
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 info: Link ha1:eth1 up.
heartbeat: 2009/11/07_00:18:56 ERROR: process_status_message: bad node in message
heartbeat: 2009/11/07_00:18:56 ERROR: MSG: Dumping message with 12 fields
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG : [(1)srcuuid=0x9696dc8(36 27)]
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :

四、测试
1. 手动切换是否正常
在HA1上执行/usr/share/heartbeat/hb_standby看VIP是否能够转移到HA2
查看heartbeat的日志信息
tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:44:33 info: ha1 wants to go standby
heartbeat: 2009/11/07_19:44:33 info: standby: ha2 can take our all resources
heartbeat: 2009/11/07_19:44:33 info: give up all HA resources (standby).
ResourceManager: 2009/11/07_19:44:34 info: Releasing resource group: ha1 192.168.2.100 nginxd
ResourceManager: 2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/nginxdstop
ResourceManager: 2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 stop
IPaddr: 2009/11/07_19:44:34 INFO: ifconfig eth0:0 down
IPaddr: 2009/11/07_19:44:34 INFO:Success
heartbeat: 2009/11/07_19:44:34 info: all HA resource release completed (standby).
heartbeat: 2009/11/07_19:44:34 info: Local standby process completed .
heartbeat: 2009/11/07_19:44:36 WARN: 1 lost packet(s) for
heartbeat: 2009/11/07_19:44:36 info: remote resource transition completed.
heartbeat: 2009/11/07_19:44:36 info: No pkts missing from ha2!
heartbeat: 2009/11/07_19:44:36 info: Other node completed standby takeover of all resources.
查看HA2上VIP已经配置上，nginx也已启动。

2. 切断主节点和备份节点的心跳线看是VIP否能够转移
Down掉HA1的eth1网卡，在HA2上查看heartbeat日志
# tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:59:36 WARN: node ha1: is dead
heartbeat: 2009/11/07_19:59:36 WARN: No STONITH device configured.
heartbeat: 2009/11/07_19:59:36 WARN: Shared disks are not protected.
heartbeat: 2009/11/07_19:59:36 info: Resources being acquired from ha1.
heartbeat: 2009/11/07_19:59:36 info: Link ha1:eth1 dead.
harc: 2009/11/07_19:59:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:59:36 info: No local resources to acquire.
mach_down: 2009/11/07_19:59:36 info: Taking over resource group 192.168.2.100
ResourceManager: 2009/11/07_19:59:36 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr: 2009/11/07_19:59:37 INFO:Resource is stopped
ResourceManager: 2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr: 2009/11/07_19:59:37 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr: 2009/11/07_19:59:37 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr: 2009/11/07_19:59:37 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr: 2009/11/07_19:59:37 INFO:Success
ResourceManager: 2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/nginxdstart
mach_down: 2009/11/07_19:59:38 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down: 2009/11/07_19:59:38 info: mach_down takeover complete for node ha1.
heartbeat: 2009/11/07_19:59:38 info: mach_down takeover complete.
资源从HA1转移到了HA2。

启动HA1的eth1网卡，可以看到资源从HA2上自动转移到HA1上。

3. 停掉HA1或是停掉HA1上的heartbeat，看VIP是否能够转移到HA2
资源从HA1转移到了HA2。

五、HA管理

启动/停止heartbeat：
service heartbeat start/stop

查看heartbeat状态：
# service heartbeat status
heartbeat OK is running on ha2 …

手工切换（将本地资源转移到远程主机）：
# /usr/share/heartbeat/hb_standby
2009/11/07_20:11:03 Going standby .

手动接管（将资源接管到本地）：
# /usr/share/heartbeat/hb_takeover

总结：通过上面的配置可以达到当其中一个节点Down掉后有另一个节点接管资源目的，但是当nginx本身Down掉后并不能自动故障转移，要想达到此目的必须配置heartbeat style 2.x，请参考《Heartbeat实现Nginx高可用性(style 2.x)》

页: [1]

运维网's Archiver

Heartbeat实现Nginx高可用性(style 1.x)