iszjw 发表于 2018-11-14 06:29:43

基于corosync+pacemaker实现nfs+nginx部署

  基于corosync+pacemaker实现nfs+nginx(crm管理)高可用-centos7
  pcs相关配置:(因为在7版本,所以pcs支持比较好,crmsh比较复杂)
  环境主机-centos7:node1:172.25.0.29 node2:172.25.0.30
  配置集群的前提:
  1、时间同步
  2、主机名互相访问
  3、是否使用仲裁设备。
  生命周期管理工具主要包括以下:
  Pcs:agent(pcsd) :应用于corosync+pacemaker
  Crash:pssh : 应用于ansible相关的服务
  一、安装corosync+pacemaker和crm管理包
  1、先配置相关主机和相关时间同步服务器:
  node1:
# cat /etc/hosts  
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
  
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
  
172.25.0.29 node1
  
172.25.0.30 node2
  
# crontab -e
  
*/5 * * * * ntpdate cn.pool.ntp.org   ###添加任务
  node2:
# cat /etc/hosts  
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
  
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
  
172.25.0.29 node1
  
172.25.0.30 node2
  
# crontab -e
  
*/5 * * * * ntpdate cn.pool.ntp.org   ###添加任务
  在node1和node2上可以看到已经添加时间任务:
# crontab -l  
*/5 * * * * ntpdate cn.pool.ntp.org
  
# crontab -l
  
*/5 * * * * ntpdate cn.pool.ntp.org
  添加node1和node2的信任关系
# ssh-keygen  
# ssh-copy-id node2
  
The authenticity of host 'node2 (172.25.0.30)' can't be established.
  
ECDSA key fingerprint is ae:88:02:59:f9:7f:e9:4f:48:8d:78:d2:6f:c7:7a:f1.
  
Are you sure you want to continue connecting (yes/no)? yes
  
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
  
/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
  我这里已经添加了,才会出现警告
  2、在node1和node2个结点上执行:
# yum install -y pacemaker pcs psmisc policycoreutils-python  
# yum install -y pacemaker pcs psmisc policycoreutils-python
  3、node1和node2上启动pcs并且让开机启动:
# systemctl start pcsd.service  
# systemctl enable pcsd
  
# systemctl start pcsd.service
  
# systemctl enable pcsd
  4、在两台主机上修改用户hacluster的密码:
# echo 123456 | passwd --stdin hacluster  
# echo 123456 | passwd --stdin hacluster
  下面的可以一台主机同步配置了
  node1上:
  5、注册pcs集群主机(默认注册使用用户名hacluster,和密码):
# pcs cluster auth node1 node2   ##设置注册那个集群节点  
node2: Already authorized
  
node1: Already authorized
  6、在集群上注册两台集群:
# pcs cluster setup --name mycluster node1 node2 --force   ##设置集群  7、接下来就在某个节点上已经生成来corosync配置文件:
# cd /etc/corosync/##进入corosync目录  
# ls
  
corosync.confcorosync.conf.examplecorosync.conf.example.udpucorosync.xml.exampleuidgid.d
  #我们看到生成来corosync.conf配置文件:
  8、启动集群:
# pcs cluster start --all  
node1: Starting Cluster...
  
node2: Starting Cluster...
  
##相当于启动pacemaker和corosync:
  
#ps -ef | grep corosync
  
root      19586      10 18:05 ?      00:00:40 corosync
  
root      29230212950 19:13 pts/1    00:00:00 grep --color=auto corosync
  
# ps -ef | grep pacemaker
  
root       1843      10 11:21 ?      00:00:04 /usr/libexec/pacemaker/lrmd
  
haclust+   1845      10 11:21 ?      00:00:03 /usr/libexec/pacemaker/pengine
  
root      19593      10 18:05 ?      00:00:01 /usr/sbin/pacemakerd -f
  
haclust+19594195930 18:05 ?      00:00:01 /usr/libexec/pacemaker/cib
  
root      19595195930 18:05 ?      00:00:00 /usr/libexec/pacemaker/stonithd
  
haclust+19596195930 18:05 ?      00:00:00 /usr/libexec/pacemaker/attrd
  
haclust+19597195930 18:05 ?      00:00:01 /usr/libexec/pacemaker/crmd
  
root      29288212950 19:14 pts/1    00:00:00 grep --color=auto pacemaker
  
###可以看到corosync和pacemaker已经起来了
  9、查看集群的状态
# corosync-cfgtool -s  
Printing ring status.
  
Local node ID 1
  
RING ID 0
  
id= 172.25.0.29
  
status= ring 0 active with no faults
  
# ssh node2 corosync-cfgtool -s
  
Printing ring status.
  
Local node ID 2
  
RING ID 0
  
id= 172.25.0.30
  
status= ring 0 active with no faults
  
###可以发现node1和node2的集群都已经起来。
  10、到这里我们先查看集群是否有错:
# crm_verify -L -V  
   error: unpack_resources:   Resource start-up disabled since no STONITH resources have been defined
  
   error: unpack_resources:   Either configure some or disable STONITH with the stonith-enabled option
  
   error: unpack_resources:   NOTE: Clusters with shared data need STONITH to ensure data integrity
  
Errors found during check: config not valid
  
##发现有错,要我们关掉 stonith-enabled,避免下一步出错我们先关掉这个
  
# pcs property set stonith-enabled=false
  
# crm_verify -L -V
  
# pcs property list
  
Cluster Properties:
  
cluster-infrastructure: corosync
  
cluster-name: mycluster
  
dc-version: 1.1.16-12.el7_4.2-94ff4df
  
have-watchdog: false
  
stonith-enabled: false
  11、现在我们可以下载安装crmsh来操作(从github来下载,然后解压直接安装):
  https://codeload.github.com/ClusterLabs/crmsh/tar.gz/2.3.2
  node1上:
# cd /usr/local/src/  
# ls
  
crmsh-2.3.2.tar
  
#tar xvf crmsh-2.3.2.tar
  
# ls
  
crmsh-2.3.2.tar crmsh-2.3.2
  
# cd crmsh-2.3.2
  
# python setup.py install##编译安装
  node2上:跟node1同样的操作
  二、源代码安装nginx和安装nfs
  ###在node1和node2安装nginx,下面是node1的操作:
  1、安装nginx软件依赖包:
yum -y groupinstall "Development Tools" "Server Platform Deveopment"  
yum -y install openssl-devel pcre-devel
  2、在所有的主机上面都操作,下载nginx包
# yum install wget –y               ##安装wget工具  3、下载nginx包:
# wget http://nginx.org/download/nginx-1.12.0.tar.gz  4、添加nginx运行的用户:
# useradd nginx  5解压nginx包,并且安装:
# tar zxvf nginx-1.12.0.tar.gz  
# cd nginx-1.12.0/
  6、安装nginx包:
# ./configure --prefix=/usr/local/nginx --user=nginx --group=nginx --with-http_ssl_module --with-http_flv_module --with-http_stub_status_module --with-http_gzip_static_module--with-pcre  
###编译安装
  
# make && make install
  
node1、node2装完后测试nginx
  6、测试nginx:
  node1上:
# cd /usr/local/nginx/  
# echo node1 > html/index.html
  
#/usr/local/nginx/sbin/nginx
  node2上:
# cd /usr/local/nginx/  
# echo node2 > html/index.html
  
#/usr/local/nginx/sbin/nginx
  访问web服务:
#curl 172.25.0.29  
node1
  
#curl 172.25.0.29
  
node2
  node1、node2可以正常访问
  把nginx关闭,因为等会利用corosync和pacemaker自动管理nginx
  建个nginx启动脚本,等下启动nginx需要,在node1和node2上都要新建
# cat /etc/systemd/system/nginx.service  

  
Description=nginx
  
After=network.target
  

  

  
Type=forking
  
ExecStart=/usr/local/nginx/sbin/nginx
  
ExecReload=/usr/local/nginx/sbin/nginx -s reload
  
ExecStop=/usr/local/nginx/sbin/nginx -s quit
  
PrivateTmp=true
  

  

  
WantedBy=multi-user.target
  
##node2上也同样的操作
  需要给脚本执行权限
# chmod a+x /etc/systemd/system/nginx.service  
# chmod a+x /etc/systemd/system/nginx.service
  
# systemctl enable nginx
  
# systemctl enable nginx   ##在systemd资源代理下,要有enable 才能被crm识别,所以要把nginx enable掉
  nfs搭建:
  nfs的作用我们都明确,所以我们只需在一台上安装就好,我这里在node1安装
#yum install -y rpc-bind nfs-utils  
# mkdir /www   ###新建www的目录,等会用于共享。
  
# cat /etc/exports
  
/www*(rw,async,no_root_squash)
  
#systemctl restart nfs   ###重启nfs
  
# showmount -e 172.25.0.29
  
Export list for 172.25.0.29:
  
/www *                     ##可以发现www这个目录已经共享了
  
# echo node>  /www/index.html    ###给共享目录添加index.html,用于虚拟ip的访问
  三、高可用实现nfs+nginx
  1、资源嗲里的使用方法:
  在node1上配置:
# crm ra  
crm(live)ra# info systemd:nginx
  
systemd unit file for nginx (systemd:nginx)
  
Cluster Controlled nginx
  
Operations' defaults (advisory minimum):
  
    start         timeout=100
  
    stop          timeout=100
  
    status      timeout=100
  
    monitor       timeout=100 interval=60
  2、进入配置模式configure下:
crm(live)ra# cd  
crm(live)#cd configure
  
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.25.0.100###添加虚拟ip
  
##配置好之后用show查看
  
crm(live)configure# show
  
node 1: node1
  
node 2: node2
  
primitive webip IPaddr \
  
      params ip=172.25.0.100
  
property cib-bootstrap-options: \
  
      have-watchdog=false \
  
      dc-version=1.1.16-12.el7_4.2-94ff4df \
  
      cluster-infrastructure=corosync \
  
      cluster-name=mycluster \
  
      stonith-enabled=false
  
crm(live)configure# verify      #检查脚本是否有错
  
crm(live)configure# commit      ##提交、保存
  
crm(live)configure# cd
  3、定义web服务资源:
  进入配置模式configure:
crm(live)configure# primitive webserver systemd:nginx      ##添加nginx服务  
crm(live)configure# verify
  
WARNING: webserver: default timeout 20s for start is smaller than the advised 100
  
WARNING: webserver: default timeout 20s for stop is smaller than the advised 100
  
### 小于时间间隔会有警告,可以不用理会。
  
crm(live)configure# commit
  
WARNING: webserver: default timeout 20s for start is smaller than the advised 100
  
WARNING: webserver: default timeout 20s for stop is smaller than the advised 100
  ##提交有个警告不用管:
crm(live)configure# show  
node 1: node1 \
  
      attributes standby=off
  
node 2: node2
  
primitive vip IPaddr \
  
      params ip=172.25.0
  
primitive web systemd:nginx \
  
      op monitor interval=30s timeout=100s \
  
      op start timeout=100s interval=0 \
  
      op stop timeout=100s interval=0
  
property cib-bootstrap-options: \
  
      have-watchdog=false \
  
      dc-version=1.1.16-12.el7_4.4-94ff4df \
  
      cluster-infrastructure=corosync \
  
      cluster-name=mycluster \
  
      stonith-enabled=false
  ##我们检测下已经有两个资源了:
crm(live)configure# cd  
crm(live)# status
  
Stack: corosync
  
Current DC: node1 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
  
Last updated: Sat Oct 14 21:20:59 2017
  
Last change: Sat Oct 14 21:17:43 2017 by root via cibadmin on node1
  
2 nodes configured
  
2 resources configured
  
Online: [ node1 node2 ]
  
Full list of resources:
  
webip(ocf::heartbeat:IPaddr):      Started node2
  
webserver      (systemd:nginx):      Started node1
  ##我们也发现默认资源也是均衡了,但是我们发现不均衡了分配了资源,但是我们需要定义是一个组的,所以把两个资源加一组(为了实现高可用)
  把两个添加到同个组里面:
crm(live)# configure  
crm(live)configure# group webservice webip webserver   ##添加 webservice webip在同个组里面
  
crm(live)configure# verify
  
crm(live)configure# commit
  
crm(live)configure# cd ..
  
crm(live)# status
  
Stack: corosync
  
Current DC: node1 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
  
Last updated: Sat Oct 14 21:24:17 2017
  
Last change: Sat Oct 14 21:24:12 2017 by root via cibadmin on node1
  
2 nodes configured
  
2 resources configured
  
Online: [ node1 node2 ]
  
Full list of resources:
  
Resource Group: webservice
  
   webip      (ocf::heartbeat:IPaddr):      Started node1
  
   webserver(systemd:httpd):      Started node1               ##可以发现 webservice webip在同个组里面了
  4、定义nfs资源:
  查看文件系统类型
crm(live)ra# info ocf:heartbeat:Filesystem  
device* (string): block device
  
    The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.
  
directory* (string): mount point
  
    The mount point for the filesystem.
  
fstype* (string): filesystem type
  
    The type of filesystem to be mounted.
  ###有三个必填项目
  ##开始配置
crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype="nfs" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=40s      ###定义/www 挂载到/usr/local/nginx/html下  5、定义排列约束:
crm(live)configure# colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore)  
crm(live)configure# verify
  
WARNING: webserver_with_webstore_and_webip: resource webserver is grouped, constraints should apply to the group
  
WARNING: webserver_with_webstore_and_webip: resource webip is grouped, constraints should apply to the group
  
crm(live)configure# commit
  ##查看状态:
crm(live)configure# show  
node 1: node1 \
  
attributes standby=off
  
node 2: node2
  
primitive webip IPaddr \
  
params ip=172.25.0.100
  
primitive webserver systemd:nginx \
  
op monitor interval=30s timeout=100s \
  
op start timeout=60s interval=0 \
  
op stop timeout=60s interval=0
  
primitive webstore Filesystem \
  
params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype=nfs \
  
op start timeout=60s interval=0 \
  
op stop timeout=60s interval=0 \
  
op monitor interval=20s timeout=40s
  
group webservice webip webserver
  
colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore )
  
property cib-bootstrap-options: \
  
have-watchdog=false \
  
dc-version=1.1.16-12.el7_4.4-94ff4df \
  
cluster-infrastructure=corosync \
  
cluster-name=mycluster \
  
stonith-enabled=false \
  6、定义执行顺序:
crm(live)configure# order webstore_after_webip Mandatory: webip webstore  
crm(live)configure# verify
  
crm(live)configure# order webserver_after_webstore Mandatory: webstore webserver
  
crm(live)configure#
  ###查看一下状态
crm(live)# status  
Stack: corosync
  
Current DC: node1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
  
Last updated: Wed Oct 25 20:46:41 2017
  
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
  
2 nodes configured
  
3 resources configured
  
Online: [ node1 node2 ]
  
Full list of resources:
  
Resource Group: webservice
  
   webip(ocf::heartbeat:IPaddr):Started node1
  
   webserver(systemd:nginx):Started node1
  
webstore(ocf::heartbeat:Filesystem):Started node1
  
##可以看到我们的顺序是webip webserver webstore
  7、测试
# ip addr  
1: lo:mtu 65536 qdisc noqueue state UNKNOWN qlen 1
  
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  
    inet 127.0.0.1/8 scope host lo
  
       valid_lft forever preferred_lft forever
  
    inet6 ::1/128 scope host
  
       valid_lft forever preferred_lft forever
  
2: ens33:mtu 1500 qdisc pfifo_fast state UP qlen 1000
  
    link/ether 00:0c:29:49:e9:da brd ff:ff:ff:ff:ff:ff
  
    inet 172.25.0.29/24 brd 172.25.0.255 scope global ens33
  
       valid_lft forever preferred_lft forever
  
    inet 172.25.0.100/24 brd 172.25.0.255 scope global secondary ens33
  
       valid_lft forever preferred_lft forever
  
    inet6 fe80::20c:29ff:fe49:e9da/64 scope link
  可以看到vip已经起来了
  接下来访问web服务:
# curl 172.25.0.100  
node
  可以发现访问的是/www/index里的内用
  我们把node1的pacemaker和corosync停掉
# systemctl stop pacemaker    ##先关pacemaker先  
# systemctl stop corosync
  在node2上可以看到node2已经接管了
# crm  
crm(live)# status
  
Stack: corosync
  
Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
  
Last updated: Wed Oct 25 20:54:33 2017
  
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
  
2 nodes configured
  
3 resources configured
  
Online: [ node2 ]
  
OFFLINE: [ node1 ]
  
Full list of resources:
  
Resource Group: webservice
  
   webip(ocf::heartbeat:IPaddr):Started node2
  
   webserver(systemd:nginx):Started node2
  
webstore(ocf::heartbeat:Filesystem):Started node2
  
crm(live)#exit
  
# ip addr
  
1: lo:mtu 65536 qdisc noqueue state UNKNOWN qlen 1
  
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  
    inet 127.0.0.1/8 scope host lo
  
       valid_lft forever preferred_lft forever
  
    inet6 ::1/128 scope host
  
       valid_lft forever preferred_lft forever
  
2: ens33:mtu 1500 qdisc pfifo_fast state UP qlen 1000
  
    link/ether 00:0c:29:64:00:b1 brd ff:ff:ff:ff:ff:ff
  
    inet 172.25.0.30/24 brd 172.25.0.255 scope global ens33
  
       valid_lft forever preferred_lft forever
  
    inet 172.25.0.100/24 brd 172.25.0.255 scope global secondary ens33
  
       valid_lft forever preferred_lft forever
  
    inet6 fe80::20c:29ff:fe64:b1/64 scope link
  ##vip已经转移到node2上
# df -h  
Filesystem         SizeUsed Avail Use% Mounted on
  
/dev/mapper/cl-root   18G2.5G   16G14% /
  
devtmpfs             226M   0226M   0% /dev
  
tmpfs                237M   86M151M37% /dev/shm
  
tmpfs                237M8.6M228M   4% /run
  
tmpfs                237M   0237M   0% /sys/fs/cgroup
  
/dev/sda1         1014M197M818M20% /boot
  
tmpfs               48M   0   48M   0% /run/user/0
  
172.25.0.29:/www      18G2.5G   16G14% /usr/local/nginx/html
  ###/www 也已经挂载到/usr/local/nginx/html下
# curl 172.25.0.100  
node
  ###访问web资源也没问题了,说明实现成功
  在node1上把pacemaker和corosync重启
# crm  
crm(live)# status
  
Stack: corosync
  
Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
  
Last updated: Wed Oct 25 21:00:40 2017
  
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
  
2 nodes configured
  
3 resources configured
  
Online: [ node1 node2 ]
  
Full list of resources:
  
Resource Group: webservice
  
   webip(ocf::heartbeat:IPaddr):Started node2
  
   webserver(systemd:nginx):Started node2
  
webstore(ocf::heartbeat:Filesystem):Started node2
  
crm(live)#
  ###可以看到node2已经接管了。
  四、其他优化
  如果设置抢占模式可以这样设
crm(live)configure# location nginx_in_node1 nginx inf: node1   ###位置绑定,慎用  服务管理
crm(live)configure# propertymigration-limit=1      ###当本地服务停掉了,将会启动本地服务一次,如果起不来就换到另一主机的服务。  crm更改文件
crm(live)# configure  
crm(live)configure# edit    ###会进入配置文件,模式相当于vim的模式
  
node 1: node1 \
  
      attributes standby=off
  
node 2: node2
  
primitive webip IPaddr \
  
      params ip=172.25.0.100
  
primitive webserver systemd:nginx \
  
      op monitor interval=30s timeout=100s \
  
      op start timeout=60s interval=0 \
  
      op stop timeout=60s interval=0
  
primitive webstore Filesystem \
  
      params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype=nfs \
  
      op start timeout=60s interval=0 \
  
      op stop timeout=60s interval=0 \
  
      op monitor interval=20s timeout=40s
  
group webservice webip webserver
  
order webserver_after_webstore Mandatory: webstore webserver
  
colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore )
  
order webstore_after_webip Mandatory: webip webstore
  
property cib-bootstrap-options: \
  
      have-watchdog=false \
  
      dc-version=1.1.16-12.el7_4.4-94ff4df \
  
      cluster-infrastructure=corosync \
  
      cluster-name=mycluster \
  
      stonith-enabled=false \
  
      migration-limit=1
  ###可以看到刚刚配的内容,可以增删修改。
  以上所有是我基于pacemaker+corosync实现nfs+nginx部署内容。


页: [1]
查看完整版本: 基于corosync+pacemaker实现nfs+nginx部署