filts 发表于 2019-1-7 09:42:39

高可用集群heartbeat安装配置(一)

  一、HA高可
  FailOver:故障转移 包含HA Resource IP, service,STONITH
  FailBack故障转移原点
  Faiover domain:故障转移域
  资源粘性资源更倾向于运行于哪个节点
  Messagin Layer:集群事务信息层仅用来传递信息并不负责后期信息计算与比较
  CRM:claster resource meanager 集群资源管理器负责统计收集集群上每一个资源状态根据资源状态资源服务本身计算出应该运行在哪个节点上。
  DC:Desinated Coordinator 事务协调员
  PE:Policy Engine 策略引擎是CRM一个子功能
  TE:Transaction 事务引擎由它指挥
  LRM:local resource manager 本地资源管理器 负责执行
  

  

  资源约束Constraint
  排列约束: (coloation)
  资源是否能够运行于同一节点
  score:
  正值可以在一起
  负值不能在一起
  位置约束(location), score(分数)
  正值倾向于此节点
  负值倾向于逃离于此节点
  顺序约束: (order)
  定义资源启动或关闭时的次序
  vip, ipvs
  ipvs-->vip
  

  资源隔离
  节点级别STONITH
  资源级别
  例如FC SAN switch可以实现在存储资源级别拒绝某节点的访问
  

  

  STONITH
  

  

  split-brain: 集群节点无法有效获取其它节点的状态信息时产生脑裂
  后果之一抢占共享存储
  仲裁磁盘
  

  

  二、案例

  snn
  192.168.1.5
  

  datanode4
  192.168.1.6
  

  vip192.168.1.7
  

服务器名称系统CPU架构内核IP地址角色snn.abc.comCentOS release 6.5x86_642.6.32-431.el6.x86_64192.168.1.5master  

服务器名称系统CPU架构内核IP地址角色datanode4.abc.comCentOS release 6.5x86_642.6.32-431.el6.x86_64192.168.1.6slave  

  epel下有我们需要安装包
  heartbeat - Heartbeat subsystem for High-Availability Linux核心包
  heartbeat-devel - Heartbeat development package 开发包
  heartbeat-gui - Provides a gui interface to manage heartbeat clusters 管理heartbeat图形界面
  heartbeat-ldirectord - Monitor daemon for maintaining high availability resources, 为ipvs高可用提供规则自动生成及后端realserver健康状态检查的组件
  heartbeat-pils - Provides a general plugin and interface loading library 装载库和插件接口
  heartbeat-stonith - Provides an interface to Shoot The Other Node In The Head
  

  三、前期配置
  1、主机名解析
  # cat /etc/hosts
  192.168.1.5    snn.abc.com    snn
  192.168.1.6    datanode4.abc.com    datanode4
  

  # hostname
  snn.abc.com
  

  # cat /etc/sysconfig/network
  NETWORKING=yes
  HOSTNAME=snn.abc.com
  

  2、双机互信

  snn
  #ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
  #ssh-copy-id -i .ssh/id_rsa.pub root@192.168.10.6
  执行测试一下
  # ssh 192.168.1.6 'ifconfig'
  

  datenode4

  # ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
  # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.5
  # ssh 192.168.1.5 'ifconfig'
  

  3、时间同步
  # crontab -e
  

  */2 * * * * /usr/sbin/ntpdate time.nist.gov &> /dev/null
  

  

  # scp /var/spool/cron/root datanode4:/var/spool/cron/
  

  四、安装heartbeat

  1、解决依赖安包
  # yum install perl-TimeDate PyXML libnet net-snmp-libs -y
  2、只需安装这四个即可
  1# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpmheartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
  error: Failed dependencies:
  libnet.so.1()(64bit) is needed by heartbeat-2.1.4-12.el6.x86_64
  pygtk2-libglade is needed by heartbeat-gui-2.1.4-12.el6.x86_64
  2解决依赖包
  下载安装epel

  # wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
  # rpm -ivh epel-release-latest-6.noarch.rpm
  3安装依赖包libnet

  # yum install libnet
  (4)再次安装
  # rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpmheartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
  Preparing...                ###########################################
  1:heartbeat-pils         ########################################### [ 25%]
  2:heartbeat-stonith      ########################################### [ 50%]
  3:heartbeat            ########################################### [ 75%]
  4:heartbeat-gui          ###########################################
  

  3、6的节点scp过去
  root@snn heartbeat]# scp epel-release-latest-6.noarch.rpm heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpmdatanode4:/root/heartbeat/
  

  五、配置
  1、三个配置文件默认是没有的

  # ls /etc/ha.d/
  harcrc.dREADME.configresource.dshellfuncs
  1密钥文件600, authkeys

  2heartbeat服务的配置配置ha.cf
  3资源管理配置文件haresources
  

  2、复制样例文件

  # cp /usr/share/doc/heartbeat-2.1.4/{authkeys,haresources,ha.cf} ./
  

  3、修改authkeys 600权限
  # chmod 600 authkeys
  

  4、做个随机码
  # dd if=/dev/random count=1 bs=512 | md5sum
  记录了0+1 的读入
  记录了0+1 的写出
  29字节(29 B)已复制8.0656e-05 秒360 kB/秒
  71cc2b8ff1bd825fce13ceaea932501d-
  

  # vim authkeys
  auth 1
  1 md5 71cc2b8ff1bd825fce13ceaea932501d
  

  5、核心配置文件ha.cf
  ha.cf
  debugfile 调试信息
  logfile 日志文件
  logacility
  keepalive 每隔多长时间发送一次心跳信息
  deadtime 多长时间替换
  warnrime 警告时间
  initdead 启动heartbeat时多长时间探测
  udpprot 端口
  bcast 广播
  mcast 多播 255.0.30.1
  ucast 组播
  auto_failback 是否自动转回
  stonith bay
  ping 仲裁设备
  node 节点信息不能使用ip地址
  ping_group ping组
  debug debug级别
  compression 压缩传输算法
  compression_threshold 压缩大小
  

  验证以后要关闭服务并设置服务开机不能启动

  

  # vim ha.cf
  bcast   eth0            # Linux
  node    snn.abc.com
  node    datanode4.abc.com
  

  6、两台主机都安装httpd服务
  # yum install httpd
  # echo "snn.abc.com" >> /var/www/html/index.html
  

  验证以后要关闭服务,并设置服务开机不能启动
  # service httpd stop
  # chkconfig httpd off

  # chkconfig httpd off
  

  7、定义aresources文件
  先说明主节点

  node1.magedu.com VIP httpd
  

  resource.d文件夹用来定义RA
  先找resource.d文件夹后找/etc/rs.d/init.d/
  

  VIP
  ip/netmask/网卡/广播地址
  

  # vim haresources
  snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd
  

  8、每个节点都需要有此文件,scp -p 保存原来属性
  # scp -p authkeys ha.cf haresources datanode4:/etc/ha.d/
  

  六、启动服务
  # service heartbeat start
  # ssh datanode4 'service heartbeat start'
  

  # tail -f /var/log/messages
  Jun 13 17:28:55 snn heartbeat: : info: Link 192.168.1.1:192.168.1.1 up.
  Jun 13 17:28:55 snn heartbeat: : info: Status update for node 192.168.1.1: status ping
  Jun 13 17:28:55 snn heartbeat: : info: Link snn.abc.com:eth0 up.//两个节点都up起来了
  Jun 13 17:29:02 snn heartbeat: : info: Link datanode4.abc.com:eth0 up.
  Jun 13 17:29:02 snn heartbeat: : info: Status update for node datanode4.abc.com: status up                                                            //检查状态信息
  Jun 13 17:29:02 snn harc: info: Running /etc/ha.d/rc.d/status status
  Jun 13 17:29:03 snn heartbeat: : info: Comm_now_up(): updating status to active
  Jun 13 17:29:03 snn heartbeat: : info: Local status now set to: 'active'
  Jun 13 17:29:03 snn heartbeat: : info: Status update for node datanode4.abc.com: status active
  Jun 13 17:29:03 snn harc: info: Running /etc/ha.d/rc.d/status status
  Jun 13 17:29:13 snn heartbeat: : info: remote resource transition completed.
  Jun 13 17:29:13 snn heartbeat: : info: remote resource transition completed.
  Jun 13 17:29:13 snn heartbeat: : info: Initial resource acquisition complete (T_RESOURCES(us))
  Jun 13 17:29:14 snn IPaddr: INFO:Resource is stopped
  Jun 13 17:29:14 snn heartbeat: : info: Local Resource acquisition completed.
  Jun 13 17:29:14 snn harc: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
  Jun 13 17:29:14 snn ip-request-resp: received ip-request-resp IPaddr::192.168.1.7/24/eth0 OK yes
  Jun 13 17:29:14 snn ResourceManager: info: Acquiring resource group: snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd
  Jun 13 17:29:14 snn IPaddr: INFO:Resource is stopped
  Jun 13 17:29:14 snn ResourceManager: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 start                            //资源配置start
  Jun 13 17:29:14 snn IPaddr: INFO: Using calculated netmask for 192.168.1.7: 255.255.255.0
  Jun 13 17:29:14 snn IPaddr: INFO: eval ifconfig eth0:0 192.168.1.7 netmask 255.255.255.0 broadcast 192.168.1.255
  Jun 13 17:29:14 snn IPaddr: INFO:Success
  Jun 13 17:29:14 snn ResourceManager: info: Running /etc/init.d/httpdstart //http
  

  # netstat -tlunp | grep 80
  tcp      0      0 :::80                     :::*                        LISTEN      3464/httpd
  

  # ifconfig
  eth0      Link encap:EthernetHWaddr 00:0C:29:B1:89:48
  inet addr:192.168.1.5Bcast:192.168.1.255Mask:255.255.255.0
  inet6 addr: fe80::20c:29ff:feb1:8948/64 Scope:Link
  UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
  RX packets:35659 errors:0 dropped:0 overruns:0 frame:0
  TX packets:10024 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:4539049 (4.3 MiB)TX bytes:2100109 (2.0 MiB)
  

  eth0:0    Link encap:EthernetHWaddr 00:0C:29:B1:89:48
  inet addr:192.168.1.7Bcast:192.168.1.255Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
  

  七、利用一个脚本模拟主备切换
  # sh /usr/lib64/heartbeat/hb_standby
  2015/06/13_17:42:27 Going standby .
  

  # tail -f /var/log/messages
  Jun 13 17:42:28 snn ResourceManager: info: Running /etc/init.d/httpdstop
  Jun 13 17:42:28 snn ResourceManager: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 stop
  Jun 13 17:42:29 snn IPaddr: INFO: ifconfig eth0:0 down
  Jun 13 17:42:29 snn IPaddr: INFO:Success
  Jun 13 17:42:29 snn heartbeat: : info: all HA resource release completed (standby).
  Jun 13 17:42:29 snn heartbeat: : info: Local standby process completed .
  Jun 13 17:42:30 snn heartbeat: : WARN: 1 lost packet(s) for
  Jun 13 17:42:30 snn heartbeat: : info: remote resource transition completed.
  Jun 13 17:42:30 snn heartbeat: : info: No pkts missing from datanode4.abc.com!
  Jun 13 17:42:30 snn heartbeat: : info: Other node completed standby takeover of all resources.    //其他节点完成备用接管所有的资源
  

  在6这个主机下看看
  
  # ifconfig
  eth0      Link encap:EthernetHWaddr 00:0C:29:E1:2F:66
  inet addr:192.168.1.6Bcast:192.168.1.255Mask:255.255.255.0
  inet6 addr: fe80::20c:29ff:fee1:2f66/64 Scope:Link
  UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
  RX packets:37277 errors:0 dropped:0 overruns:0 frame:0
  TX packets:3812 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:5065186 (4.8 MiB)TX bytes:648956 (633.7 KiB)
  

  eth0:0    Link encap:EthernetHWaddr 00:0C:29:E1:2F:66
  inet addr:192.168.1.7Bcast:192.168.1.255Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
  

  lo      Link encap:Local Loopback
  inet addr:127.0.0.1Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNINGMTU:65536Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:0 (0.0 b)TX bytes:0 (0.0 b)
  

  # netstat -ltunp | grep 80
  tcp      0      0 :::80                     :::*                        LISTEN      2782/httpd
  

  八、可以通过挂载nfs的方式
  

  1、启用另一台2.168.1.4datanode.abc.com   datanode 做nfs文件系统
  # mkdir /web/htodcs -p
  

  2、共享的目录文件
  # vim /etc/exports
  /web/htodcs   192.168.0.0/24(ro)
  

  3、启动nfs服务
  # service nfs start
  启动 NFS 服务:                                          [确定]
  关掉 NFS 配额:                                          [确定]
  启动 NFS mountd:                                          [确定]
  启动 NFS 守护进程:                                        [确定]
  正在启动 RPC idmapd:                                    [确定]
  

  # showmount -e 192.168.1.4
  Export list for 192.168.1.4:
  /web/htodcs 192.168.0.0/24
  

  4、来到3这台主机,先把heartbeat停掉,在改资源配置文件

  # ssh datanode4 '/etc/init.d/heartbeat stop'
  Stopping High-Availability services:
  Done.
  

  # service heartbeat stop
  Stopping High-Availability services:
  Done.
  

  # vim haresources
  

  

  # mount -t nfs 192.168.1.4:/web/htdocs /mnt
  # mount -l | grep mnt
  192.168.1.4:/web/htdocs on /mnt type nfs (rw,vers=4,addr=192.168.1.4,clientaddr=192.168.1.5)
  

  # cat /mnt/index.html
  datanode.abc.com
  

  测试能挂载上来,
  # umount /mnt
  

  九、在3主机上资源管理器挂载文件系统

  资源先后次序很关键
  先配置IP,然后配置文件系统,再配置服务
  文件系统一定在服务之前的
  # vim /etc/ha.d/ha.cf
  snn.abc.com IPaddr::192.168.1.7/24/eth0 Filesystem::192.168.1.4:/web/htdocs::/var/www/html::nfs httpd
  

  # scp /etc/ha.d/haresources datanode4:/etc/ha.d/haresources
  

  十、启动heartbeat后,查看日志
  //有错,原因已经在heartbeat第二章写出来了!



附件:http://down.运维网.com/data/2365804

页: [1]
查看完整版本: 高可用集群heartbeat安装配置(一)