Heartbeat实现Nginx高可用性(style 1.x)
一、准备工作1. 系统:两台CentOS 5.4虚拟机
2. Hostname:HA1,HA2
3. IP地址:HA1 eth0:192.168.2.10 eth1:192.168.10.1
HA2 eth0:192.168.2.20 eth1:192.168.10.2
4. VIP:192.168.2.100 (Failover转移用的IP)
二、安装
1. Nginx编译安装
tar xzvf pcre-7.9.tar.gz
cd pcre-7.9
./configure
make
make install
cd ..
tar xzvf nginx-0.7.63.tar.gz
cd nginx-0.7.63
./configure –user=nobody –group=nobody –prefix=/usr/local/nginx –with-http_stub_status_module –with-http_ssl_module
make
make install
Nginx具体配置略。
2. Heartbeat编译安装
tar xzvf libnet-1.1.2.1.tar.gz
cd libnet
./configure
make
make install
cd ..
创建用户和用户组
heartbeat需要haclient用户组和hacluster用户两个节点做同样的操作,并保证haclient和hacluster的ID一样。
groupadd -g 500 haclient
useradd -u 500 -g haclient hacluster
tar jxvf STABLE-2.1.4.tar.bz2
cd Heartbeat-STABLE-2-1-STABLE-2.1.4/
./ConfigureMe configure
make
make install
# 拷贝配置文件到相应目录
cp doc/ha.cf /etc/ha.d/
cp doc/haresources /etc/ha.d/
cp doc/authkeys /etc/ha.d/
cd !$ # 跳转到/etc/ha.d/目录
三、配置Heartbeat
在/etc/ha.d/目录下进行配置:
1. vi authkeys # 节点认证方式,这里使用第一种crc
auth 1
1 crc
# 修改authkeys权限为600
chmode 600 authkeys
2. 编辑/etc/ha.d/ha.cf:
# cat ha.cf |sed ‘/^#/d’
# 开启HA的debug日志,建议调试完后关闭此日志
debugfile /var/log/ha-debug
# 开启HA日志
logfile /var/log/ha-log
# 设置日志打印级别
logfacility local0
# 多长时间建材一次心跳
keepalive 2
# 连续多长时间检测失败示对方挂掉,单位秒
deadtime 30
# 连续多长时间检测失败开始警告提示,单位秒
warntime 10
# 为服务重启预留一段时间,在这段时间不进行心跳检测
initdead 120
# 默认端口是UDP 694,我改为了695,如果在局域网还有人在玩Heartbeat,并且他用广播,你最好改个端口
# 否则可能会导致认证失败
udpport 695
# 使用单播通信,在HA2上修改为ucast eth1 192.168.10.1
ucast eth1 192.168.10.2
# 主节点恢复正常后是否再切换回来
auto_failback on
# 设置看门狗
# Watchdog在实现上可以是硬件电路也可以是软件定时器,能够在系统出现故障时自动重新启动系统。
# 在Linux 内核下,
watchdog的基本工作原理是:当watchdog启动后(即/dev/watchdog
设备被打开后),
# 如果在某一设定的时间间隔内/dev/watchdog没有被执行写操作,
# 硬件watchdog电路或软件定时器就会重新启动系统。
watchdog /dev/watchdog
# 节点列表,主节点在前,不要写反了
node HA1
node HA2
3. # cat haresources
# 每一行代表一个资源组,资源组启动顺序是从左往右,关闭的顺序是从右往左。
# 一个资源组里面不同资源之间以空格分隔,不同的资源组之间没有必然关系
# 资源组的第一列是我们在ha.cf配置文件中列出的节点之一,而且应该是准备作为主节点的那一个节点。
# 每个资源都是一个脚本,可以放在/etc/init.d目录下面,也可以在/usr/local/etc/ha.d/resource.d目录下。
# 这些脚本必须要支持start和stop参数。
# 脚本的参数通过::来分隔。
# 主节点 VIP 资源名
HA1 192.168.2.100 nginxd
4. 编写nginxd资源脚本,放到/etc/rc.d/init.d/和/etc/ha.d/resource.d/下
#!/bin/sh
# source function library
. /etc/rc.d/init.d/functions
# Source networking configuration.
. /etc/sysconfig/network
# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0
RETVAL=0
prog="nginx"
nginxDir=/usr/local/nginx
nginxd=$nginxDir/sbin/nginx
nginxConf=$nginxDir/conf/nginx.conf
nginxPid=$nginxDir/nginx.pid
nginx_check()
{
if [[ -e $nginxPid ]]; then
ps aux |grep -v grep |grep -q nginx
if (( $? == 0 )); then
echo "$prog already running..."
exit 1
else
rm -rf $nginxPid &> /dev/null
fi
fi
}
start()
{
nginx_check
if (( $? != 0 )); then
true
else
echo -n $"Starting $prog:"
daemon $nginxd -c $nginxConf
RETVAL=$?
echo
[ $RETVAL = 0 ] && touch /var/lock/subsys/nginx
return $RETVAL
fi
}
stop()
{
echo -n $"Stopping $prog:"
killproc $nginxd
RETVAL=$?
echo
[ $RETVAL = 0 ] && rm -f /var/lock/subsys/nginx $nginxPid
}
reload()
{
echo -n $"Reloading $prog:"
killproc $nginxd -HUP
RETVAL=$?
echo
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
reload)
reload
;;
status)
status $prog
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|restart|reload|status}"
RETVAL=1
esac
exit $RETVAL
设置hosts
# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 vpc localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.1 HA1
192.168.10.2 HA2
注:在HA1和HA2上进行二、三步(安装、配置heartbeat)操作
6. 启动heartbeat
注意:主服务器和备份服务器的时间同步,如果相差太多heartbeat可能发生故障。
service heartbeat restart
查看heartbeat的日志启动信息(日志对于排错很有帮助)
tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:41:27 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2009/11/07_19:41:27 info: heartbeat: version 2.1.4
heartbeat: 2009/11/07_19:41:27 info: Heartbeat generation: 1257517561
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: bound send socket to device: eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: bound receive socket to device: eth1
heartbeat: 2009/11/07_19:41:27 info: glib: ucast: started on port 695 interface eth1 to 192.168.10.2
heartbeat: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_19:41:27 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_19:41:27 notice: Using watchdog device: /dev/watchdog
heartbeat: 2009/11/07_19:41:27 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2009/11/07_19:41:27 info: Local status now set to: ‘up’
heartbeat: 2009/11/07_19:41:29 info: Link ha2:eth1 up.
heartbeat: 2009/11/07_19:41:29 info: Status update for node ha2: status up
harc: 2009/11/07_19:41:29 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:41:30 info: Comm_now_up(): updating status to active
heartbeat: 2009/11/07_19:41:30 info: Local status now set to: ‘active’
heartbeat: 2009/11/07_19:41:30 info: Status update for node ha2: status active
harc: 2009/11/07_19:41:30 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:41:45 info: local resource transition completed.
heartbeat: 2009/11/07_19:41:45 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr: 2009/11/07_19:41:45 INFO:Resource is stopped
heartbeat: 2009/11/07_19:41:45 info: Local Resource acquisition completed.
harc: 2009/11/07_19:41:45 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp: 2009/11/07_19:41:45 received ip-request-resp 192.168.2.100 OK yes
ResourceManager: 2009/11/07_19:41:45 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr: 2009/11/07_19:41:45 INFO:Resource is stopped
ResourceManager: 2009/11/07_19:41:45 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr: 2009/11/07_19:41:46 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr: 2009/11/07_19:41:46 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr: 2009/11/07_19:41:46 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr: 2009/11/07_19:41:46 INFO:Success
heartbeat: 2009/11/07_19:41:46 info: remote resource transition completed.
查看网卡配置情况,VIP已配置到HA1上。
eth0:0 Link encap:EthernetHWaddr 00:0C:29:35:6F:D0
inet addr:192.168.2.100Bcast:192.168.2.255Mask:255.255.255.0
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
Interrupt:67 Base address:0×2000
查看nginx已经启动。
如果看到下面日志,可能是同网段中有人在UDP 694端口运行广播的heartbeat,换个端口试试可能能解决问题。
heartbeat: 2009/11/07_00:18:53 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2009/11/07_00:18:53 info: heartbeat: version 2.1.4
heartbeat: 2009/11/07_00:18:53 info: Heartbeat generation: 1257517538
heartbeat: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat: 2009/11/07_00:18:53 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 – Status: 1
heartbeat: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_00:18:53 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2009/11/07_00:18:53 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2009/11/07_00:18:53 info: Local status now set to: ‘up’
heartbeat: 2009/11/07_00:18:55 ERROR: process_status_message: bad node in message
heartbeat: 2009/11/07_00:18:55 ERROR: MSG: Dumping message with 12 fields
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG : [(1)srcuuid=0x9696e70(36 27)]
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 ERROR: MSG :
heartbeat: 2009/11/07_00:18:55 info: Link ha1:eth1 up.
heartbeat: 2009/11/07_00:18:56 ERROR: process_status_message: bad node in message
heartbeat: 2009/11/07_00:18:56 ERROR: MSG: Dumping message with 12 fields
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG : [(1)srcuuid=0x9696dc8(36 27)]
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
heartbeat: 2009/11/07_00:18:56 ERROR: MSG :
四、测试
1. 手动切换是否正常
在HA1上执行/usr/share/heartbeat/hb_standby看VIP是否能够转移到HA2
查看heartbeat的日志信息
tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:44:33 info: ha1 wants to go standby
heartbeat: 2009/11/07_19:44:33 info: standby: ha2 can take our all resources
heartbeat: 2009/11/07_19:44:33 info: give up all HA resources (standby).
ResourceManager: 2009/11/07_19:44:34 info: Releasing resource group: ha1 192.168.2.100 nginxd
ResourceManager: 2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/nginxdstop
ResourceManager: 2009/11/07_19:44:34 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 stop
IPaddr: 2009/11/07_19:44:34 INFO: ifconfig eth0:0 down
IPaddr: 2009/11/07_19:44:34 INFO:Success
heartbeat: 2009/11/07_19:44:34 info: all HA resource release completed (standby).
heartbeat: 2009/11/07_19:44:34 info: Local standby process completed .
heartbeat: 2009/11/07_19:44:36 WARN: 1 lost packet(s) for
heartbeat: 2009/11/07_19:44:36 info: remote resource transition completed.
heartbeat: 2009/11/07_19:44:36 info: No pkts missing from ha2!
heartbeat: 2009/11/07_19:44:36 info: Other node completed standby takeover of all resources.
查看HA2上VIP已经配置上,nginx也已启动。
2. 切断主节点和备份节点的心跳线看是VIP否能够转移
Down掉HA1的eth1网卡,在HA2上查看heartbeat日志
# tail -100 /var/log/ha-log
heartbeat: 2009/11/07_19:59:36 WARN: node ha1: is dead
heartbeat: 2009/11/07_19:59:36 WARN: No STONITH device configured.
heartbeat: 2009/11/07_19:59:36 WARN: Shared disks are not protected.
heartbeat: 2009/11/07_19:59:36 info: Resources being acquired from ha1.
heartbeat: 2009/11/07_19:59:36 info: Link ha1:eth1 dead.
harc: 2009/11/07_19:59:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2009/11/07_19:59:36 info: No local resources to acquire.
mach_down: 2009/11/07_19:59:36 info: Taking over resource group 192.168.2.100
ResourceManager: 2009/11/07_19:59:36 info: Acquiring resource group: ha1 192.168.2.100 nginxd
IPaddr: 2009/11/07_19:59:37 INFO:Resource is stopped
ResourceManager: 2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.100 start
IPaddr: 2009/11/07_19:59:37 INFO: Using calculated nic for 192.168.2.100: eth0
IPaddr: 2009/11/07_19:59:37 INFO: Using calculated netmask for 192.168.2.100: 255.255.255.0
IPaddr: 2009/11/07_19:59:37 INFO: eval ifconfig eth0:0 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255
IPaddr: 2009/11/07_19:59:37 INFO:Success
ResourceManager: 2009/11/07_19:59:37 info: Running /etc/ha.d/resource.d/nginxdstart
mach_down: 2009/11/07_19:59:38 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down: 2009/11/07_19:59:38 info: mach_down takeover complete for node ha1.
heartbeat: 2009/11/07_19:59:38 info: mach_down takeover complete.
资源从HA1转移到了HA2。
启动HA1的eth1网卡,可以看到资源从HA2上自动转移到HA1上。
3. 停掉HA1或是停掉HA1上的heartbeat,看VIP是否能够转移到HA2
资源从HA1转移到了HA2。
五、HA管理
启动/停止heartbeat:
service heartbeat start/stop
查看heartbeat状态:
# service heartbeat status
heartbeat OK is running on ha2 …
手工切换(将本地资源转移到远程主机):
# /usr/share/heartbeat/hb_standby
2009/11/07_20:11:03 Going standby .
手动接管(将资源接管到本地):
# /usr/share/heartbeat/hb_takeover
总结:通过上面的配置可以达到当其中一个节点Down掉后有另一个节点接管资源目的,但是当nginx本身Down掉后并不能自动故障转移,要想达到此目的必须配置heartbeat style 2.x,请参考《Heartbeat实现Nginx高可用性(style 2.x)》
页:
[1]