高可用集群heartbeat安装配置(一)
一、HA高可FailOver:故障转移 包含HA Resource IP, service,STONITH
FailBack故障转移原点
Faiover domain:故障转移域
资源粘性资源更倾向于运行于哪个节点
Messagin Layer:集群事务信息层仅用来传递信息并不负责后期信息计算与比较
CRM:claster resource meanager 集群资源管理器负责统计收集集群上每一个资源状态根据资源状态资源服务本身计算出应该运行在哪个节点上。
DC:Desinated Coordinator 事务协调员
PE:Policy Engine 策略引擎是CRM一个子功能
TE:Transaction 事务引擎由它指挥
LRM:local resource manager 本地资源管理器 负责执行
资源约束Constraint
排列约束: (coloation)
资源是否能够运行于同一节点
score:
正值可以在一起
负值不能在一起
位置约束(location), score(分数)
正值倾向于此节点
负值倾向于逃离于此节点
顺序约束: (order)
定义资源启动或关闭时的次序
vip, ipvs
ipvs-->vip
资源隔离
节点级别STONITH
资源级别
例如FC SAN switch可以实现在存储资源级别拒绝某节点的访问
STONITH
split-brain: 集群节点无法有效获取其它节点的状态信息时产生脑裂
后果之一抢占共享存储
仲裁磁盘
二、案例
snn
192.168.1.5
datanode4
192.168.1.6
vip192.168.1.7
服务器名称系统CPU架构内核IP地址角色snn.abc.comCentOS release 6.5x86_642.6.32-431.el6.x86_64192.168.1.5master
服务器名称系统CPU架构内核IP地址角色datanode4.abc.comCentOS release 6.5x86_642.6.32-431.el6.x86_64192.168.1.6slave
epel下有我们需要安装包
heartbeat - Heartbeat subsystem for High-Availability Linux核心包
heartbeat-devel - Heartbeat development package 开发包
heartbeat-gui - Provides a gui interface to manage heartbeat clusters 管理heartbeat图形界面
heartbeat-ldirectord - Monitor daemon for maintaining high availability resources, 为ipvs高可用提供规则自动生成及后端realserver健康状态检查的组件
heartbeat-pils - Provides a general plugin and interface loading library 装载库和插件接口
heartbeat-stonith - Provides an interface to Shoot The Other Node In The Head
三、前期配置
1、主机名解析
# cat /etc/hosts
192.168.1.5 snn.abc.com snn
192.168.1.6 datanode4.abc.com datanode4
# hostname
snn.abc.com
# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=snn.abc.com
2、双机互信
snn
#ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.10.6
执行测试一下
# ssh 192.168.1.6 'ifconfig'
datenode4
# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.5
# ssh 192.168.1.5 'ifconfig'
3、时间同步
# crontab -e
*/2 * * * * /usr/sbin/ntpdate time.nist.gov &> /dev/null
# scp /var/spool/cron/root datanode4:/var/spool/cron/
四、安装heartbeat
1、解决依赖安包
# yum install perl-TimeDate PyXML libnet net-snmp-libs -y
2、只需安装这四个即可
1# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpmheartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
error: Failed dependencies:
libnet.so.1()(64bit) is needed by heartbeat-2.1.4-12.el6.x86_64
pygtk2-libglade is needed by heartbeat-gui-2.1.4-12.el6.x86_64
2解决依赖包
下载安装epel
# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
# rpm -ivh epel-release-latest-6.noarch.rpm
3安装依赖包libnet
# yum install libnet
(4)再次安装
# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpmheartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
Preparing... ###########################################
1:heartbeat-pils ########################################### [ 25%]
2:heartbeat-stonith ########################################### [ 50%]
3:heartbeat ########################################### [ 75%]
4:heartbeat-gui ###########################################
3、6的节点scp过去
root@snn heartbeat]# scp epel-release-latest-6.noarch.rpm heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpmdatanode4:/root/heartbeat/
五、配置
1、三个配置文件默认是没有的
# ls /etc/ha.d/
harcrc.dREADME.configresource.dshellfuncs
1密钥文件600, authkeys
2heartbeat服务的配置配置ha.cf
3资源管理配置文件haresources
2、复制样例文件
# cp /usr/share/doc/heartbeat-2.1.4/{authkeys,haresources,ha.cf} ./
3、修改authkeys 600权限
# chmod 600 authkeys
4、做个随机码
# dd if=/dev/random count=1 bs=512 | md5sum
记录了0+1 的读入
记录了0+1 的写出
29字节(29 B)已复制8.0656e-05 秒360 kB/秒
71cc2b8ff1bd825fce13ceaea932501d-
# vim authkeys
auth 1
1 md5 71cc2b8ff1bd825fce13ceaea932501d
5、核心配置文件ha.cf
ha.cf
debugfile 调试信息
logfile 日志文件
logacility
keepalive 每隔多长时间发送一次心跳信息
deadtime 多长时间替换
warnrime 警告时间
initdead 启动heartbeat时多长时间探测
udpprot 端口
bcast 广播
mcast 多播 255.0.30.1
ucast 组播
auto_failback 是否自动转回
stonith bay
ping 仲裁设备
node 节点信息不能使用ip地址
ping_group ping组
debug debug级别
compression 压缩传输算法
compression_threshold 压缩大小
验证以后要关闭服务并设置服务开机不能启动
# vim ha.cf
bcast eth0 # Linux
node snn.abc.com
node datanode4.abc.com
6、两台主机都安装httpd服务
# yum install httpd
# echo "snn.abc.com" >> /var/www/html/index.html
验证以后要关闭服务,并设置服务开机不能启动
# service httpd stop
# chkconfig httpd off
# chkconfig httpd off
7、定义aresources文件
先说明主节点
node1.magedu.com VIP httpd
resource.d文件夹用来定义RA
先找resource.d文件夹后找/etc/rs.d/init.d/
VIP
ip/netmask/网卡/广播地址
# vim haresources
snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd
8、每个节点都需要有此文件,scp -p 保存原来属性
# scp -p authkeys ha.cf haresources datanode4:/etc/ha.d/
六、启动服务
# service heartbeat start
# ssh datanode4 'service heartbeat start'
# tail -f /var/log/messages
Jun 13 17:28:55 snn heartbeat: : info: Link 192.168.1.1:192.168.1.1 up.
Jun 13 17:28:55 snn heartbeat: : info: Status update for node 192.168.1.1: status ping
Jun 13 17:28:55 snn heartbeat: : info: Link snn.abc.com:eth0 up.//两个节点都up起来了
Jun 13 17:29:02 snn heartbeat: : info: Link datanode4.abc.com:eth0 up.
Jun 13 17:29:02 snn heartbeat: : info: Status update for node datanode4.abc.com: status up //检查状态信息
Jun 13 17:29:02 snn harc: info: Running /etc/ha.d/rc.d/status status
Jun 13 17:29:03 snn heartbeat: : info: Comm_now_up(): updating status to active
Jun 13 17:29:03 snn heartbeat: : info: Local status now set to: 'active'
Jun 13 17:29:03 snn heartbeat: : info: Status update for node datanode4.abc.com: status active
Jun 13 17:29:03 snn harc: info: Running /etc/ha.d/rc.d/status status
Jun 13 17:29:13 snn heartbeat: : info: remote resource transition completed.
Jun 13 17:29:13 snn heartbeat: : info: remote resource transition completed.
Jun 13 17:29:13 snn heartbeat: : info: Initial resource acquisition complete (T_RESOURCES(us))
Jun 13 17:29:14 snn IPaddr: INFO:Resource is stopped
Jun 13 17:29:14 snn heartbeat: : info: Local Resource acquisition completed.
Jun 13 17:29:14 snn harc: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Jun 13 17:29:14 snn ip-request-resp: received ip-request-resp IPaddr::192.168.1.7/24/eth0 OK yes
Jun 13 17:29:14 snn ResourceManager: info: Acquiring resource group: snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd
Jun 13 17:29:14 snn IPaddr: INFO:Resource is stopped
Jun 13 17:29:14 snn ResourceManager: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 start //资源配置start
Jun 13 17:29:14 snn IPaddr: INFO: Using calculated netmask for 192.168.1.7: 255.255.255.0
Jun 13 17:29:14 snn IPaddr: INFO: eval ifconfig eth0:0 192.168.1.7 netmask 255.255.255.0 broadcast 192.168.1.255
Jun 13 17:29:14 snn IPaddr: INFO:Success
Jun 13 17:29:14 snn ResourceManager: info: Running /etc/init.d/httpdstart //http
# netstat -tlunp | grep 80
tcp 0 0 :::80 :::* LISTEN 3464/httpd
# ifconfig
eth0 Link encap:EthernetHWaddr 00:0C:29:B1:89:48
inet addr:192.168.1.5Bcast:192.168.1.255Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb1:8948/64 Scope:Link
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
RX packets:35659 errors:0 dropped:0 overruns:0 frame:0
TX packets:10024 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4539049 (4.3 MiB)TX bytes:2100109 (2.0 MiB)
eth0:0 Link encap:EthernetHWaddr 00:0C:29:B1:89:48
inet addr:192.168.1.7Bcast:192.168.1.255Mask:255.255.255.0
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
七、利用一个脚本模拟主备切换
# sh /usr/lib64/heartbeat/hb_standby
2015/06/13_17:42:27 Going standby .
# tail -f /var/log/messages
Jun 13 17:42:28 snn ResourceManager: info: Running /etc/init.d/httpdstop
Jun 13 17:42:28 snn ResourceManager: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 stop
Jun 13 17:42:29 snn IPaddr: INFO: ifconfig eth0:0 down
Jun 13 17:42:29 snn IPaddr: INFO:Success
Jun 13 17:42:29 snn heartbeat: : info: all HA resource release completed (standby).
Jun 13 17:42:29 snn heartbeat: : info: Local standby process completed .
Jun 13 17:42:30 snn heartbeat: : WARN: 1 lost packet(s) for
Jun 13 17:42:30 snn heartbeat: : info: remote resource transition completed.
Jun 13 17:42:30 snn heartbeat: : info: No pkts missing from datanode4.abc.com!
Jun 13 17:42:30 snn heartbeat: : info: Other node completed standby takeover of all resources. //其他节点完成备用接管所有的资源
在6这个主机下看看
# ifconfig
eth0 Link encap:EthernetHWaddr 00:0C:29:E1:2F:66
inet addr:192.168.1.6Bcast:192.168.1.255Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fee1:2f66/64 Scope:Link
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
RX packets:37277 errors:0 dropped:0 overruns:0 frame:0
TX packets:3812 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5065186 (4.8 MiB)TX bytes:648956 (633.7 KiB)
eth0:0 Link encap:EthernetHWaddr 00:0C:29:E1:2F:66
inet addr:192.168.1.7Bcast:192.168.1.255Mask:255.255.255.0
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNINGMTU:65536Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b)TX bytes:0 (0.0 b)
# netstat -ltunp | grep 80
tcp 0 0 :::80 :::* LISTEN 2782/httpd
八、可以通过挂载nfs的方式
1、启用另一台2.168.1.4datanode.abc.com datanode 做nfs文件系统
# mkdir /web/htodcs -p
2、共享的目录文件
# vim /etc/exports
/web/htodcs 192.168.0.0/24(ro)
3、启动nfs服务
# service nfs start
启动 NFS 服务: [确定]
关掉 NFS 配额: [确定]
启动 NFS mountd: [确定]
启动 NFS 守护进程: [确定]
正在启动 RPC idmapd: [确定]
# showmount -e 192.168.1.4
Export list for 192.168.1.4:
/web/htodcs 192.168.0.0/24
4、来到3这台主机,先把heartbeat停掉,在改资源配置文件
# ssh datanode4 '/etc/init.d/heartbeat stop'
Stopping High-Availability services:
Done.
# service heartbeat stop
Stopping High-Availability services:
Done.
# vim haresources
# mount -t nfs 192.168.1.4:/web/htdocs /mnt
# mount -l | grep mnt
192.168.1.4:/web/htdocs on /mnt type nfs (rw,vers=4,addr=192.168.1.4,clientaddr=192.168.1.5)
# cat /mnt/index.html
datanode.abc.com
测试能挂载上来,
# umount /mnt
九、在3主机上资源管理器挂载文件系统
资源先后次序很关键
先配置IP,然后配置文件系统,再配置服务
文件系统一定在服务之前的
# vim /etc/ha.d/ha.cf
snn.abc.com IPaddr::192.168.1.7/24/eth0 Filesystem::192.168.1.4:/web/htdocs::/var/www/html::nfs httpd
# scp /etc/ha.d/haresources datanode4:/etc/ha.d/haresources
十、启动heartbeat后,查看日志
//有错,原因已经在heartbeat第二章写出来了!
附件:http://down.运维网.com/data/2365804
页:
[1]