基于heartbeat v2 和 heartbeat
针对LVS的Director的高可用集群测试实验目的:
1. 测试 Director的高可用集群
2. 观察heartbeat-ldirectord 对后端 Real Server 健康状况的检测
实验环境:
Redhat 5.8
VIP 192.168.0.2
Real Server:
RS1 192.168.0.5
RS2 192.168.0.6
Director:
node1 192.168.0.11
node2 192.168.0.12
需要用到的rpm包:
heartbeat-2.1.4-9.el5.i386.rpm
heartbeat-stonith-2.1.4-10.el5.i386.rpm
heartbeat-gui-2.1.4-9.el5.i386.rpm
libnet-1.1.4-3.el5.i386.rpm
heartbeat-ldirectord-2.1.4-9.el5.i386.rpm
perl-MailTools-1.77-1.el5.noarch.rpm
heartbeat-pils-2.1.4-10.el5.i386.rpm
另外还要准备好系统光盘,作为yum源
下面来说具体实验过程:
一. 先配置 Real Server
1. 同步两台Real Server的时间
# hwclock -s
2. 安装 apache
# yum -y install httpd
为两台Real Server提供网页文件
[*]# echo "Real Server 1" > /var/www/html/index.html
[*][root@RS2 ~]# echo "Real Server 2" > /var/www/html/index.html
[*]# vi /etc/httpd/conf/httpd.conf
[*] 更改:ServerName RS1.yue.com
[*]
[*][root@RS2 ~]# vi /etc/httpd/conf/httpd.conf
[*] 更改:ServerName RS2.yue.com
# /etc/init.d/httpd start
3. 在RS1上编辑内核的相关参数
[*]# echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_ignore
[*]# echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
[*]# echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
[*]# echo 2 > /proc/sys/net/ipv4/conf/eth0/arp_announce
[*]# ifconfig lo:0 192.168.0.2 broadcast 192.168.0.255 netmask 255.255.255.255 up
[*]# ifconfig
[*]eth0 Link encap:EthernetHWaddr 00:0C:29:7E:8B:C6
[*] inet addr:192.168.0.5Bcast:192.168.0.255Mask:255.255.255.0
[*] UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
[*] RX packets:2719 errors:0 dropped:0 overruns:0 frame:0
[*] TX packets:3628 errors:0 dropped:0 overruns:0 carrier:0
[*] collisions:0 txqueuelen:1000
[*] RX bytes:200533 (195.8 KiB)TX bytes:644821 (629.7 KiB)
[*] Interrupt:67 Base address:0x2000
[*]
[*]lo Link encap:Local Loopback
[*] inet addr:127.0.0.1Mask:255.0.0.0
[*] UP LOOPBACK RUNNINGMTU:16436Metric:1
[*] RX packets:71 errors:0 dropped:0 overruns:0 frame:0
[*] TX packets:71 errors:0 dropped:0 overruns:0 carrier:0
[*] collisions:0 txqueuelen:0
[*] RX bytes:5699 (5.5 KiB)TX bytes:5699 (5.5 KiB)
[*]
[*]lo:0 Link encap:Local Loopback
[*] inet addr:192.168.0.2Mask:255.255.255.255
[*] UP LOOPBACK RUNNINGMTU:16436Metric:1
[*]
[*]
[*]# elinks -dump http://192.168.0.5
[*]Real Server 1
[*]# elinks -dump http://192.168.0.2
[*]Real Server 1
[*]# route add -host 192.168.0.2 dev lo:0
[*]# route -n
[*]Kernel IP routing table
[*]Destination Gateway Genmask Flags Metric Ref Use Iface
[*]192.168.0.2 0.0.0.0 255.255.255.255 UH 0 0 0 lo
[*]192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
[*]169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
[*]0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0
http://blog.运维网.com/attachment/201208/112906971.jpg
设定服务开机自动启动
[*]# chkconfig --add httpd
[*]# chkconfig httpd on
[*]# chkconfig --list httpd
[*]httpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
4. 在RS2 上做同样的设置
http://blog.运维网.com/attachment/201208/120054688.jpg二、配置Director
节点的主机名要与 "uname -n "的结果一致
1. 先在 node1 上配置
同步时间:
# hwclock -s
主机名解析:
# vi /etc/hosts 添加如下内容
192.168.0.11 node1.yue.com node1
192.168.0.12 node2.yue.com node2
主机名:
# hostname RS1
[*]# vi /etc/sysconfig/network
[*]
[*]NETWORKING=yes
[*]NETWORKING_IPV6=no
[*]HOSTNAME=node1.yue.com
IP地址:
[*]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
[*]
[*]# Advanced Micro Devices 79c970
[*]DEVICE=eth0
[*]BOOTPROTO=none
[*]ONBOOT=yes
[*]HWADDR=00:0c:29:e7:3d:5c
[*]IPADDR=192.168.0.11
[*]GATEWAY=192.168.0.1
[*]NETAMASK=255.255.255.0
双机互信:
[*]# ssh-keygen -t rsa
[*]Generating public/private rsa key pair.
[*]Enter file in which to save the key (/root/.ssh/id_rsa): 密码为空(直接回车)
[*]Created directory '/root/.ssh'.
[*]Enter passphrase (empty for no passphrase): 再次输入密码
[*]Enter same passphrase again:
[*]Your identification has been saved in /root/.ssh/id_rsa.
[*]Your public key has been saved in /root/.ssh/id_rsa.pub.
[*]The key fingerprint is:
[*]0f:c8:62:6b:2e:68:4c:8b:ce:0f:25:52:23:93:c7:0a root@node1.yue.com
[*]# ssh-copy-id -i .ssh/id_rsa.pubroot@node2.yue.com 将公钥传送到node2(默认在root用户的家目录下的.ssh目录下)
[*]15
[*]The authenticity of host 'node2.yue.com (192.168.0.12)' can't be established.
[*]RSA key fingerprint is 9d:d9:14:94:81:c2:7b:d5:7b:af:2c:64:58:8f:e3:49.
[*]Are you sure you want to continue connecting (yes/no)? yes 提示是否接受连接
[*]root@node2.yue.com's password: 输入node2的密码
[*]Now try logging into the machine, with "ssh 'root@node2.yue.com'", and check in:
[*]
[*].ssh/authorized_keys
[*]
[*]to make sure we haven't added extra keys that you weren't expecting.
测试一下效果:
[*]# ssh node2 'ifconfig' 远程在node2上执行命令令
[*]The authenticity of host 'node2 (192.168.0.12)' can't be established.
[*]RSA key fingerprint is 9d:d9:14:94:81:c2:7b:d5:7b:af:2c:64:58:8f:e3:49.
[*]Are you sure you want to continue connecting (yes/no)? yes
[*]Warning: Permanently added 'node2' (RSA) to the list of known hosts.
[*]eth0 Link encap:EthernetHWaddr 00:0C:29:D9:75:DF
[*] inet addr:192.168.0.12 Bcast:192.168.0.255Mask:255.255.255.0
[*] UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
[*] RX packets:629 errors:0 dropped:0 overruns:0 frame:0
[*] TX packets:528 errors:0 dropped:0 overruns:0 carrier:0
[*] collisions:0 txqueuelen:1000
[*] RX bytes:60497 (59.0 KiB)TX bytes:56572 (55.2 KiB)
[*] Interrupt:67 Base address:0x2000
[*]
[*]lo Link encap:Local Loopback
[*] inet addr:127.0.0.1Mask:255.0.0.0
[*] UP LOOPBACK RUNNINGMTU:16436Metric:1
[*] RX packets:10 errors:0 dropped:0 overruns:0 frame:0
[*] TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
[*] collisions:0 txqueuelen:0
[*] RX bytes:692 (692.0 b)TX bytes:692 (692.0 b)
2. 在 node2 上做相应的配置
3. 安装相关软件包:
[*]
[*]# ls
[*]heartbeat-2.1.4-9.el5.i386.rpm
[*]heartbeat-stonith-2.1.4-10.el5.i386.rpm
[*]heartbeat-gui-2.1.4-9.el5.i386.rpm
[*]libnet-1.1.4-3.el5.i386.rpm
[*]heartbeat-ldirectord-2.1.4-9.el5.i386.rpm
[*]perl-MailTools-1.77-1.el5.noarch.rpm
[*]heartbeat-pils-2.1.4-10.el5.i386.rpm
[*]
[*]# yum --nogpgcheck localinstall *.rpm
[*]
[*]# chkconfig --list ldirectord
[*]ldirectord 0:off 1:off 2:off 3:on 4:off 5:on 6:off
[*]# chkconfigldirectord off
[*]
[*]
[*][root@node2 tmp]# yum --nogpgcheck localinstall *.rpm
[*][root@node2 tmp]# chkconfig ldirectord off
[*][root@node2 tmp]# chkconfig --list ldirectord
[*]ldirectord 0:off 1:off 2:off 3:off 4:off 5:off 6:off
配置文件:
[*]# cp /usr/share/doc/heartbeat-ldirectord-2.1.4/ldirectord.cf/etc/ha.d/
[*]# cd/usr/share/doc/heartbeat-2.1.4/
[*]# cp ha.cf authkeys haresources/etc/ha.d/
[*]# cd /etc/ha.d/
[*]# chmod 600 authkeys 一定要改权限,否则启动的时候会报错
(1)
# vi /etc/ha.d/ha.cf
[*]logfile /var/log/ha-log
[*]logfacility local0
[*]keepalive 2 多长时间传送一次心跳信息
[*]deadtime 30 多长时间收不到心跳信息,就认为死亡
[*]warntime 10 警告时间
[*]initdead 120 第一次启动时等待多长时间,认为死亡,通常就为deadtime 的2倍
[*]udpport 694
[*]bcast eth0 # Linux 广播方式传送心跳信息
[*]auto_failback on 是否自动回收资源
[*]node node1.yue.com 节点列表.要与/etc/hosts文件中的定义相同
[*]node node2.yue.com
[*]ping 192.168.0.1 指向一个IP(通常是离我们最近的网关),检查自身的网络连接,以确定对方是否已经死亡
[*]compression bz2 传送的信息要压缩
[*]compression_threshold 2 压缩的下限
[*]
[*]并添加: crm respawn 这一行
[*]
(2)
[*]# dd if=/dev/urandomcount=1 bs=512 | md5sum
[*]1+0 records in
[*]1+0 records out
[*]512 bytes (512 B) copied, 0.00025866 seconds, 2.0 MB/s
[*]4faf8724bc49da78b21fc04ceb7b5bc3-
# vi /etc/ha.d/authkeys
[*]#auth 1 使用哪种加密机制,并指定其编号
[*]#1 crc
[*]#2 sha1 HI!
[*]#3 md5 Hello!
[*]
[*]auth 1 使用1这个编号的算法
[*]1 sha1 1daea09a52368d9fe65a37163d4ae3ea 1 号算法为sha1
(3).
# vi /etc/ha.d/ldirectord.cf
[*]virtual=192.168.0.2:80 Vip
[*] real=192.168.0.5:80 gate gate: dr模型
[*] real=192.168.0.6:80 gate
[*]
[*] fallback=127.0.0.1:80 gate 若两个Real Server都挂掉,是否通过本机给客户端一个提示信息
[*] service=http 基于什么协议检测后端的Real Server
[*] request="test.html" 检测哪个网页
[*] receive="Real Server OK" 期望从检测网页得到什么样内容
[*]
[*] scheduler=rr
[*]# persistent=600 持久性
[*] netmask=255.255.255.255 定义广播域
[*] protocol=tcp
[*] checktype=negotiate 检测的方式,协商
[*] checkport=80
提供检测页面:
[*]# vi /var/www/html/test.html
[*] Real Server OK
[*]
[*]
[*][root@RS2 ~]# vi /var/www/html/test.html
[*] Real Server OK
传送配置文件:
[*]
[*]# scp authkeys ha.cf haresourcesldirectord.cfnode2:/etc/ha.d/
[*]authkeys 100%692 0.7KB/s 00:00
[*]ha.cf 100% 10KB10.4KB/s 00:00
[*]haresources 100% 5905 5.8KB/s 00:00
[*]ldirectord.cf 100% 7689 7.5KB/s 00:00
启动heartbeat:
启动是有顺序的:必须先在node 1 上启动,然后在node1 上远程启动node2 上的heartbeat;关闭node2的时候必须是在node1远程进行
[*]# /etc/init.d/heartbeat start 先在node 1 上启动
[*]Starting High-Availability services: [ OK]
[*]# ssh node2 '/etc/init.d/heartbeat start' 在node 1 上远程启动node 2 上的heartbeat
[*]Starting High-Availability services:
[*]
查看当前集群节点的工作状况:
[*]# crm_mon -1 显示当前集群的工作状况 ,只显示一次
[*]
[*]============
[*]Last updated: Sun Aug5 09:12:40 2012
[*]Current DC: node1.yue.com (5d29dca8-514e-441d-8619-8c395db8cb70)
[*]2 Nodes configured. 2个节点
[*]0 Resources configured. 0个资源
[*]============
[*]
[*]Node: node1.yue.com (5d29dca8-514e-441d-8619-8c395db8cb70): online
[*]Node: node2.yue.com (8f2d3cdb-f19e-493d-85be-f5214f7615ff): online
[*]# netstat -tnlp 查看5560端口是否已经开启
[*]tcp 0 0 0.0.0.0:5560 0.0.0.0:* LISTEN 31039/mgmtd
[*]# crmadmin --status node1.yue.com 查看状态
[*]Status of crmd@node1.yue.com: S_IDLE (ok) 主节点DC
[*]# crmadmin --status node2.yue.com
[*]Status of crmd@node2.yue.com: S_NOT_DC (ok)
[*]# tail -1 /etc/passwd 给hacuster用户添加密码
[*]hacluster:x:101:157:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
[*]
[*]# passwd hacluster
[*]Changing password for user hacluster.
[*]New UNIX password: 输入密码
[*]BAD PASSWORD: it is based on a dictionary word
[*]Retype new UNIX password: 再输入一次
[*]passwd: all authentication tokens updated successfully.
配置资源:
web_ip
web_ldirectord
# hb_gui &
3535
http://blog.运维网.com/attachment/201208/132154513.jpg
http://blog.运维网.com/attachment/201208/132840867.jpg
http://blog.运维网.com/attachment/201208/133255333.jpg
新建资源组:web_server
http://blog.运维网.com/attachment/201208/133702661.jpg
http://blog.运维网.com/attachment/201208/133419557.jpg
在组中新建资源:
http://blog.运维网.com/attachment/201208/133901111.jpg
创建资源web_ip
http://blog.运维网.com/attachment/201208/134215712.jpg
创建资源web_ldirectord
http://blog.运维网.com/attachment/201208/134433430.jpg
启动资源:
http://blog.运维网.com/attachment/201208/134745875.jpg
http://blog.运维网.com/attachment/201208/135054401.jpg
[*]# crm_mon -1
[*]
[*]
[*]============
[*]Last updated: Sun Aug5 10:03:17 2012
[*]Current DC: node2.yue.com (8f2d3cdb-f19e-493d-85be-f5214f7615ff)
[*]2 Nodes configured.
[*]1 Resources configured.
[*]============
[*]
[*]Node: node1.yue.com (5d29dca8-514e-441d-8619-8c395db8cb70): online
[*]Node: node2.yue.com (8f2d3cdb-f19e-493d-85be-f5214f7615ff): online
[*]
[*]Resource Group: Web_server
[*] web_ldirectord (ocf::heartbeat:ldirectord): Started node1.yue.com
[*] web_ip (o cf::heartbeat:IPaddr2): Started node1.yue.com
[*]# ip addr show
[*]1: lo:mtu 16436 qdisc noqueue
[*] link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
[*] inet 127.0.0.1/8 scope host lo
[*]2: eth0:mtu 1500 qdisc pfifo_fast qlen 1000
[*] link/ether 00:0c:29:e7:3d:5c brd ff:ff:ff:ff:ff:ff
[*] inet 192.168.0.11/24 brd 192.168.0.255 scope global eth0
[*] inet 192.168.0.2/32 brd 192.168.0.255 scope global eth0
[*]# ipvsadm -Ln
[*]IP Virtual Server version 1.2.1 (size=4096)
[*]Prot LocalAddress:Port Scheduler Flags
[*]-> RemoteAddress:Port Forward Weight ActiveConn InActConn
[*]TCP192.168.0.2:80 rr
[*]-> 192.168.0.5:80 Route 1 0 3
[*]-> 192.168.0.6:80 Route 1 0 2
此时可以用浏览器测试:http://192.168.0.2查看页面是否正常,是否可以负载均衡
让node1处于Standby状态,查看资源是否会切换:
http://blog.运维网.com/attachment/201208/143919610.jpg
[*]# ipvsadm -Ln
[*]IP Virtual Server version 1.2.1 (size=4096)
[*]Prot LocalAddress:Port Scheduler Flags
[*]-> RemoteAddress:Port Forward Weight ActiveConn InActConn
[*]TCP192.168.0.2:80 rr
[*] -> 192.168.0.5:80 Route 1 0 0
[*]-> 192.168.0.6:80 Route 1 0 0 vip已经启用
[*]# ip addr show
[*]1: lo:mtu 16436 qdisc noqueue
[*] link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
[*] inet 127.0.0.1/8 scope host lo
[*]2: eth0:mtu 1500 qdisc pfifo_fast qlen 1000
[*] link/ether 00:0c:29:d9:75:df brd ff:ff:ff:ff:ff:ff
[*] inet 192.168.0.12/24 brd 192.168.0.255 scope global eth0
[*] inet 192.168.0.2/32 brd 192.168.0.255 scope global eth0 vip启用
[*]# crm_mon -1
[*]
[*]
[*]============
[*]Last updated: Sun Aug5 10:34:17 2012
[*]Current DC: node2.yue.com (8f2d3cdb-f19e-493d-85be-f5214f7615ff)
[*]2 Nodes configured.
[*]1 Resources configured.
[*]============
[*]
[*]Node: node1.yue.com (5d29dca8-514e-441d-8619-8c395db8cb70): standby
[*]Node: node2.yue.com (8f2d3cdb-f19e-493d-85be-f5214f7615ff): online
[*]
[*]Resource Group: Web_server
[*] web_ldirectord (ocf::heartbeat:ldirectord): Started node2.yue.com
[*] web_ip (ocf::heartbeat:IPaddr2): Started node2.yue.com
http://blog.运维网.com/attachment/201208/143002949.jpg
http://blog.运维网.com/attachment/201208/143044507.jpg
三、观察heartbeat-ldirectord 对后端 Real Server 健康状况的检测
可以停掉一台Real Server ,然后通过浏览器访问 http://192.168.0.2
通过刷新页面来观察heartbeat-ldirectord是否可以检测到后端Real Server的健康状况
页:
[1]