踏雪寻梅 发表于 2016-12-28 06:54:21

Heartbeat实现Nginx高可用性(style 1.x)

一、准备工作

1. 系统:两台CentOS 5.4虚拟机
2. Hostname:HA1,HA2
3. IP地址:HA1   eth0:192.168.2.10   eth1:192.168.10.1
HA2   eth0:192.168.2.20   eth1:192.168.10.2
4. VIP:192.168.2.100   (Failover转移用的IP)

二、安装

1. Nginx编译安装
tar xzvf pcre-7.9.tar.gz
cd pcre-7.9
./configure
make
make install
cd ..

tar xzvf nginx-0.7.63.tar.gz
cd nginx-0.7.63
./configure –user=nobody –group=nobody –prefix=/usr/local/nginx –with-http_stub_status_module –with-http_ssl_module
make
make install

Nginx具体配置略。

2. Heartbeat编译安装

tar xzvf libnet-1.1.2.1.tar.gz
cd libnet
./configure
make
make install
cd ..

创建用户和用户组

heartbeat需要haclient用户组和hacluster用户两个节点做同样的操作,并保证haclient和hacluster的ID一样。

groupadd -g 500 haclient

useradd -u 500 -g haclient hacluster

tar jxvf STABLE-2.1.4.tar.bz2
cd Heartbeat-STABLE-2-1-STABLE-2.1.4/
./ConfigureMe configure
make
make install
# 拷贝配置文件到相应目录
cp doc/ha.cf /etc/ha.d/
cp doc/haresources /etc/ha.d/
cp doc/authkeys /etc/ha.d/
cd !$   # 跳转到/etc/ha.d/目录

三、配置Heartbeat

在/etc/ha.d/目录下进行配置:
1. vi authkeys   # 节点认证方式,这里使用第一种crc
auth 1
1 crc
# 修改authkeys权限为600
chmode 600 authkeys

2. 编辑/etc/ha.d/ha.cf:
# cat ha.cf |sed ‘/^#/d’
# 开启HA的debug日志,建议调试完后关闭此日志
debugfile /var/log/ha-debug
# 开启HA日志
logfile    /var/log/ha-log
# 设置日志打印级别
logfacility    local0
# 多长时间建材一次心跳
keepalive 2
# 连续多长时间检测失败示对方挂掉,单位秒
deadtime 30
# 连续多长时间检测失败开始警告提示,单位秒
warntime 10
# 为服务重启预留一段时间,在这段时间不进行心跳检测
initdead 120
# 默认端口是UDP 694,我改为了695,如果在局域网还有人在玩Heartbeat,并且他用广播,你最好改个端口
# 否则可能会导致认证失败
udpport    695
# 使用单播通信,在HA2上修改为ucast    eth1 192.168.10.1
ucast    eth1 192.168.10.2
# 主节点恢复正常后是否再切换回来
auto_failback on
# 设置看门狗
# Watchdog在实现上可以是硬件电路也可以是软件定时器,能够在系统出现故障时自动重新启动系统。
# 在Linux 内核下,
watchdog的基本工作原理是:当watchdog启动后(即/dev/watchdog
设备被打开后),
# 如果在某一设定的时间间隔内/dev/watchdog没有被执行写操作,
# 硬件watchdog电路或软件定时器就会重新启动系统。
watchdog /dev/watchdog
# 节点列表,主节点在前,不要写反了
node    HA1
node    HA2

3. # cat haresources

# 每一行代表一个资源组,资源组启动顺序是从左往右,关闭的顺序是从右往左。
# 一个资源组里面不同资源之间以空格分隔,不同的资源组之间没有必然关系
# 资源组的第一列是我们在ha.cf配置文件中列出的节点之一,而且应该是准备作为主节点的那一个节点。
# 每个资源都是一个脚本,可以放在/etc/init.d目录下面,也可以在/usr/local/etc/ha.d/resource.d目录下。
# 这些脚本必须要支持start和stop参数。
# 脚本的参数通过::来分隔。
# 主节点   VIP      资源名
HA1    192.168.2.100    nginxd

4. 编写nginxd资源脚本,放到/etc/rc.d/init.d/和/etc/ha.d/resource.d/下

#!/bin/sh

# source function library
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0

RETVAL=0
prog="nginx"

nginxDir=/usr/local/nginx
nginxd=$nginxDir/sbin/nginx
nginxConf=$nginxDir/conf/nginx.conf
nginxPid=$nginxDir/nginx.pid

nginx_check()
{
    if [[ -e $nginxPid ]]; then
      ps aux |grep -v grep |grep -q nginx
      if (( $? == 0 )); then
            echo "$prog already running..."
            exit 1
      else
            rm -rf $nginxPid &> /dev/null
      fi
    fi
}

start()
{
    nginx_check
    if (( $? != 0 )); then
      true
    else
      echo -n $"Starting $prog:"
      daemon $nginxd -c $nginxConf
      RETVAL=$?
      echo
      [ $RETVAL = 0 ] && touch /var/lock/subsys/nginx
      return $RETVAL
    fi
}

stop()
{
    echo -n $"Stopping $prog:"
    killproc $nginxd
    RETVAL=$?
    echo
    [ $RETVAL = 0 ] && rm -f /var/lock/subsys/nginx $nginxPid
}

reload()
{
    echo -n $"Reloading $prog:"
    killproc $nginxd -HUP
    RETVAL=$?
    echo
}

case "$1" in
      start)
                start
                ;;
      stop)
                stop
                ;;
      restart)
                stop
                start
                ;;
      reload)
                reload
                ;;
      status)
                status $prog
                RETVAL=$?
                ;;
      *)
                echo $"Usage: $0 {start|stop|restart|reload|status}"
                RETVAL=1
esac
exit $RETVAL

设置hosts
# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1      vpc localhost.localdomain localhost
::1      localhost6.localdomain6 localhost6
192.168.10.1    HA1
192.168.10.2    HA2

注:在HA1和HA2上进行二、三步(安装、配置heartbeat)操作

6. 启动heartbeat
注意:主服务器和备份服务器的时间同步,如果相差太多heartbeat可能发生故障。

service heartbeat restart
查看heartbeat的日志启动信息(日志对于排错很有帮助)
tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:41:27 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2009/11/07_19:41:27 info: heartbeat: version 2.1.4
heartbeat: 2009/11/07_19:41:27 info: Heartbeat generation: 1257517561
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: bound send socket to device: eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: bound receive socket to device: eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: started on port 695 interface eth1 to 192.168.10.2
heartbeat: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_19:41:27 notice: Using watchdog device: /dev/watchdog
heartbeat: 2009/11/07_19:41:27 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2009/11/07_19:41:27 info: Local status now set to: ‘up’
heartbeat: 2009/11/07_19:41:29 info: Link ha2:eth1 up.
heartbeat: 2009/11/07_19:41:29 info: Status update for node ha2: status up
harc:    2009/11/07_19:41:29 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:41:30 info: Comm_now_up(): updating status to active
heartbeat: 2009/11/07_19:41:30 info: Local status now set to: ‘active’
heartbeat: 2009/11/07_19:41:30 info: Status update for node ha2: status active
harc:    2009/11/07_19:41:30 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:41:45 info: local resource transition completed.
heartbeat: 2009/11/07_19:41:45 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr:    2009/11/07_19:41:45 INFO:Resource is stopped
heartbeat: 2009/11/07_19:41:45 info: Local Resource acquisition completed.
harc:    2009/11/07_19:41:45 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp:    2009/11/07_19:41:45 received ip-request-resp 192.168.2.100 OK yes
ResourceManager:    2009/11/07_19:41:45 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr:    2009/11/07_19:41:45 INFO:Resource is stopped
ResourceManager:    2009/11/07_19:41:45 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr:    2009/11/07_19:41:46 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr:    2009/11/07_19:41:46 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr:    2009/11/07_19:41:46 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr:    2009/11/07_19:41:46 INFO:Success
heartbeat: 2009/11/07_19:41:46 info: remote resource transition completed.

查看网卡配置情况,VIP已配置到HA1上。
eth0:0    Link encap:EthernetHWaddr 00:0C:29:35:6F:D0
inet addr:192.168.2.100Bcast:192.168.2.255Mask:255.255.255.0
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
Interrupt:67 Base address:0×2000
查看nginx已经启动。

如果看到下面日志,可能是同网段中有人在UDP 694端口运行广播的heartbeat,换个端口试试可能能解决问题。

heartbeat: 2009/11/07_00:18:53 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2009/11/07_00:18:53 info: heartbeat: version 2.1.4
heartbeat: 2009/11/07_00:18:53 info: Heartbeat generation: 1257517538
heartbeat: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 – Status: 1
heartbeat: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_00:18:53 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2009/11/07_00:18:53 info: Local status now set to: ‘up’
heartbeat: 2009/11/07_00:18:55 ERROR: process_status_message: bad node in message
heartbeat: 2009/11/07_00:18:55 ERROR: MSG: Dumping message with 12 fields
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG : [(1)srcuuid=0x9696e70(36 27)]
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 info: Link ha1:eth1 up.
heartbeat: 2009/11/07_00:18:56 ERROR: process_status_message: bad node in message
heartbeat: 2009/11/07_00:18:56 ERROR: MSG: Dumping message with 12 fields
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG : [(1)srcuuid=0x9696dc8(36 27)]
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :

四、测试
1. 手动切换是否正常
在HA1上执行/usr/share/heartbeat/hb_standby看VIP是否能够转移到HA2
查看heartbeat的日志信息
tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:44:33 info: ha1 wants to go standby
heartbeat: 2009/11/07_19:44:33 info: standby: ha2 can take our all resources
heartbeat: 2009/11/07_19:44:33 info: give up all HA resources (standby).
ResourceManager:    2009/11/07_19:44:34 info: Releasing resource group: ha1 192.168.2.100 nginxd
ResourceManager:    2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/nginxdstop
ResourceManager:    2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 stop
IPaddr:    2009/11/07_19:44:34 INFO: ifconfig eth0:0 down
IPaddr:    2009/11/07_19:44:34 INFO:Success
heartbeat: 2009/11/07_19:44:34 info: all HA resource release completed (standby).
heartbeat: 2009/11/07_19:44:34 info: Local standby process completed .
heartbeat: 2009/11/07_19:44:36 WARN: 1 lost packet(s) for
heartbeat: 2009/11/07_19:44:36 info: remote resource transition completed.
heartbeat: 2009/11/07_19:44:36 info: No pkts missing from ha2!
heartbeat: 2009/11/07_19:44:36 info: Other node completed standby takeover of all resources.
查看HA2上VIP已经配置上,nginx也已启动。

2. 切断主节点和备份节点的心跳线看是VIP否能够转移
Down掉HA1的eth1网卡,在HA2上查看heartbeat日志
# tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:59:36 WARN: node ha1: is dead
heartbeat: 2009/11/07_19:59:36 WARN: No STONITH device configured.
heartbeat: 2009/11/07_19:59:36 WARN: Shared disks are not protected.
heartbeat: 2009/11/07_19:59:36 info: Resources being acquired from ha1.
heartbeat: 2009/11/07_19:59:36 info: Link ha1:eth1 dead.
harc:    2009/11/07_19:59:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:59:36 info: No local resources to acquire.
mach_down:    2009/11/07_19:59:36 info: Taking over resource group 192.168.2.100
ResourceManager:    2009/11/07_19:59:36 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr:    2009/11/07_19:59:37 INFO:Resource is stopped
ResourceManager:    2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr:    2009/11/07_19:59:37 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr:    2009/11/07_19:59:37 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr:    2009/11/07_19:59:37 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr:    2009/11/07_19:59:37 INFO:Success
ResourceManager:    2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/nginxdstart
mach_down:    2009/11/07_19:59:38 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down:    2009/11/07_19:59:38 info: mach_down takeover complete for node ha1.
heartbeat: 2009/11/07_19:59:38 info: mach_down takeover complete.
资源从HA1转移到了HA2。

启动HA1的eth1网卡,可以看到资源从HA2上自动转移到HA1上。

3. 停掉HA1或是停掉HA1上的heartbeat,看VIP是否能够转移到HA2
资源从HA1转移到了HA2。

五、HA管理

启动/停止heartbeat:
service heartbeat start/stop

查看heartbeat状态:
# service heartbeat status
heartbeat OK is running on ha2 …

手工切换(将本地资源转移到远程主机):
# /usr/share/heartbeat/hb_standby
2009/11/07_20:11:03 Going standby .

手动接管(将资源接管到本地):
# /usr/share/heartbeat/hb_takeover

总结:通过上面的配置可以达到当其中一个节点Down掉后有另一个节点接管资源目的,但是当nginx本身Down掉后并不能自动故障转移,要想达到此目的必须配置heartbeat style 2.x,请参考《Heartbeat实现Nginx高可用性(style 2.x)》
页: [1]
查看完整版本: Heartbeat实现Nginx高可用性(style 1.x)