基于corosync+pacemaker实现nfs+nginx部署 - nginx - 运维网 - Powered by Discuz! Archiver

论坛 › nginx › 基于corosync+pacemaker实现nfs+nginx部署

iszjw 发表于 2018-11-14 06:29:43

基于corosync+pacemaker实现nfs+nginx部署

　　基于corosync+pacemaker实现nfs+nginx（crm管理）高可用-centos7
　　pcs相关配置：（因为在7版本，所以pcs支持比较好，crmsh比较复杂）
　　环境主机-centos7：node1：172.25.0.29 node2：172.25.0.30
　　配置集群的前提：
　　1、时间同步
　　2、主机名互相访问
　　3、是否使用仲裁设备。
　　生命周期管理工具主要包括以下：
　　Pcs:agent(pcsd) ：应用于corosync+pacemaker
　　Crash:pssh ：应用于ansible相关的服务
　　一、安装corosync+pacemaker和crm管理包
　　1、先配置相关主机和相关时间同步服务器：
　　node1：
# cat /etc/hosts　　
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
　　
::1       localhost localhost.localdomain localhost6 localhost6.localdomain6
　　
172.25.0.29 node1
　　
172.25.0.30 node2
　　
# crontab -e
　　
*/5 * * * * ntpdate cn.pool.ntp.org ###添加任务
　　node2：
# cat /etc/hosts　　
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
　　
::1       localhost localhost.localdomain localhost6 localhost6.localdomain6
　　
172.25.0.29 node1
　　
172.25.0.30 node2
　　
# crontab -e
　　
*/5 * * * * ntpdate cn.pool.ntp.org ###添加任务
　　在node1和node2上可以看到已经添加时间任务：
# crontab -l　　
*/5 * * * * ntpdate cn.pool.ntp.org
　　
# crontab -l
　　
*/5 * * * * ntpdate cn.pool.ntp.org
　　添加node1和node2的信任关系
# ssh-keygen　　
# ssh-copy-id node2
　　
The authenticity of host 'node2 (172.25.0.30)' can't be established.
　　
ECDSA key fingerprint is ae:88:02:59:f9:7f:e9:4f:48:8d:78:d2:6f:c7:7a:f1.
　　
Are you sure you want to continue connecting (yes/no)? yes
　　
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
　　
/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
　　我这里已经添加了,才会出现警告
　　2、在node1和node2个结点上执行：
# yum install -y pacemaker pcs psmisc policycoreutils-python　　
# yum install -y pacemaker pcs psmisc policycoreutils-python
　　3、node1和node2上启动pcs并且让开机启动：
# systemctl start pcsd.service　　
# systemctl enable pcsd
　　
# systemctl start pcsd.service
　　
# systemctl enable pcsd
　　4、在两台主机上修改用户hacluster的密码:
# echo 123456 | passwd --stdin hacluster　　
# echo 123456 | passwd --stdin hacluster
　　下面的可以一台主机同步配置了
　　node1上：
　　5、注册pcs集群主机（默认注册使用用户名hacluster，和密码）：
# pcs cluster auth node1 node2 ##设置注册那个集群节点　　
node2: Already authorized
　　
node1: Already authorized
　　6、在集群上注册两台集群：
# pcs cluster setup --name mycluster node1 node2 --force ##设置集群　　7、接下来就在某个节点上已经生成来corosync配置文件：
# cd /etc/corosync/##进入corosync目录　　
# ls
　　
corosync.confcorosync.conf.examplecorosync.conf.example.udpucorosync.xml.exampleuidgid.d
　　#我们看到生成来corosync.conf配置文件：
　　8、启动集群：
# pcs cluster start --all　　
node1: Starting Cluster...
　　
node2: Starting Cluster...
　　
##相当于启动pacemaker和corosync:
　　
#ps -ef | grep corosync
　　
root    19586    10 18:05 ?    00:00:40 corosync
　　
root    29230212950 19:13 pts/1 00:00:00 grep --color=auto corosync
　　
# ps -ef | grep pacemaker
　　
root    1843    10 11:21 ?    00:00:04 /usr/libexec/pacemaker/lrmd
　　
haclust+ 1845    10 11:21 ?    00:00:03 /usr/libexec/pacemaker/pengine
　　
root    19593    10 18:05 ?    00:00:01 /usr/sbin/pacemakerd -f
　　
haclust+19594195930 18:05 ?    00:00:01 /usr/libexec/pacemaker/cib
　　
root    19595195930 18:05 ?    00:00:00 /usr/libexec/pacemaker/stonithd
　　
haclust+19596195930 18:05 ?    00:00:00 /usr/libexec/pacemaker/attrd
　　
haclust+19597195930 18:05 ?    00:00:01 /usr/libexec/pacemaker/crmd
　　
root    29288212950 19:14 pts/1 00:00:00 grep --color=auto pacemaker
　　
###可以看到corosync和pacemaker已经起来了
　　9、查看集群的状态
# corosync-cfgtool -s　　
Printing ring status.
　　
Local node ID 1
　　
RING ID 0
　　
id= 172.25.0.29
　　
status= ring 0 active with no faults
　　
# ssh node2 corosync-cfgtool -s
　　
Printing ring status.
　　
Local node ID 2
　　
RING ID 0
　　
id= 172.25.0.30
　　
status= ring 0 active with no faults
　　
###可以发现node1和node2的集群都已经起来。
　　10、到这里我们先查看集群是否有错：
# crm_verify -L -V　　
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
　　
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
　　
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
　　
Errors found during check: config not valid
　　
##发现有错，要我们关掉 stonith-enabled，避免下一步出错我们先关掉这个
　　
# pcs property set stonith-enabled=false
　　
# crm_verify -L -V
　　
# pcs property list
　　
Cluster Properties:
　　
cluster-infrastructure: corosync
　　
cluster-name: mycluster
　　
dc-version: 1.1.16-12.el7_4.2-94ff4df
　　
have-watchdog: false
　　
stonith-enabled: false
　　11、现在我们可以下载安装crmsh来操作(从github来下载，然后解压直接安装)：
　　https://codeload.github.com/ClusterLabs/crmsh/tar.gz/2.3.2
　　node1上：
# cd /usr/local/src/　　
# ls
　　
crmsh-2.3.2.tar
　　
#tar xvf crmsh-2.3.2.tar
　　
# ls
　　
crmsh-2.3.2.tar crmsh-2.3.2
　　
# cd crmsh-2.3.2
　　
# python setup.py install##编译安装
　　node2上：跟node1同样的操作
　　二、源代码安装nginx和安装nfs
　　###在node1和node2安装nginx，下面是node1的操作：
　　1、安装nginx软件依赖包：
yum -y groupinstall "Development Tools" "Server Platform Deveopment"　　
yum -y install openssl-devel pcre-devel
　　2、在所有的主机上面都操作，下载nginx包
# yum install wget –y             ##安装wget工具　　3、下载nginx包：
# wget http://nginx.org/download/nginx-1.12.0.tar.gz　　4、添加nginx运行的用户：
# useradd nginx　　5解压nginx包，并且安装：
# tar zxvf nginx-1.12.0.tar.gz　　
# cd nginx-1.12.0/
　　6、安装nginx包：
# ./configure --prefix=/usr/local/nginx --user=nginx --group=nginx --with-http_ssl_module --with-http_flv_module --with-http_stub_status_module --with-http_gzip_static_module--with-pcre　　
###编译安装
　　
# make && make install
　　
node1、node2装完后测试nginx
　　6、测试nginx：
　　node1上：
# cd /usr/local/nginx/　　
# echo node1 >　html/index.html
　　
#/usr/local/nginx/sbin/nginx
　　node2上：
# cd /usr/local/nginx/　　
# echo node2 >　html/index.html
　　
#/usr/local/nginx/sbin/nginx
　　访问web服务：
#curl 172.25.0.29　　
node1
　　
#curl 172.25.0.29
　　
node2
　　node1、node2可以正常访问
　　把nginx关闭，因为等会利用corosync和pacemaker自动管理nginx
　　建个nginx启动脚本，等下启动nginx需要，在node1和node2上都要新建
# cat /etc/systemd/system/nginx.service　　

　　
Description=nginx
　　
After=network.target
　　

　　

　　
Type=forking
　　
ExecStart=/usr/local/nginx/sbin/nginx
　　
ExecReload=/usr/local/nginx/sbin/nginx -s reload
　　
ExecStop=/usr/local/nginx/sbin/nginx -s quit
　　
PrivateTmp=true
　　

　　

　　
WantedBy=multi-user.target
　　
##node2上也同样的操作
　　需要给脚本执行权限
# chmod a+x /etc/systemd/system/nginx.service　　
# chmod a+x /etc/systemd/system/nginx.service
　　
# systemctl enable nginx
　　
# systemctl enable nginx ##在systemd资源代理下，要有enable 才能被crm识别，所以要把nginx enable掉
　　nfs搭建：
　　nfs的作用我们都明确，所以我们只需在一台上安装就好，我这里在node1安装
#yum install -y rpc-bind nfs-utils　　
# mkdir /www ###新建www的目录，等会用于共享。
　　
# cat /etc/exports
　　
/www*(rw,async,no_root_squash)
　　
#systemctl restart nfs ###重启nfs
　　
# showmount -e 172.25.0.29
　　
Export list for 172.25.0.29:
　　
/www *                   ##可以发现www这个目录已经共享了
　　
# echo node>　 /www/index.html ###给共享目录添加index.html,用于虚拟ip的访问
　　三、高可用实现nfs+nginx
　　1、资源嗲里的使用方法：
　　在node1上配置：
# crm ra　　
crm(live)ra# info systemd:nginx
　　
systemd unit file for nginx (systemd:nginx)
　　
Cluster Controlled nginx
　　
Operations' defaults (advisory minimum):
　　
start       timeout=100
　　
stop       timeout=100
　　
status    timeout=100
　　
monitor    timeout=100 interval=60
　　2、进入配置模式configure下：
crm(live)ra# cd　　
crm(live)#cd configure
　　
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.25.0.100###添加虚拟ip
　　
##配置好之后用show查看
　　
crm(live)configure# show
　　
node 1: node1
　　
node 2: node2
　　
primitive webip IPaddr \
　　
   params ip=172.25.0.100
　　
property cib-bootstrap-options: \
　　
   have-watchdog=false \
　　
   dc-version=1.1.16-12.el7_4.2-94ff4df \
　　
   cluster-infrastructure=corosync \
　　
   cluster-name=mycluster \
　　
   stonith-enabled=false
　　
crm(live)configure# verify    #检查脚本是否有错
　　
crm(live)configure# commit    ##提交、保存
　　
crm(live)configure# cd
　　3、定义web服务资源：
　　进入配置模式configure：
crm(live)configure# primitive webserver systemd:nginx    ##添加nginx服务　　
crm(live)configure# verify
　　
WARNING: webserver: default timeout 20s for start is smaller than the advised 100
　　
WARNING: webserver: default timeout 20s for stop is smaller than the advised 100
　　
### 小于时间间隔会有警告，可以不用理会。
　　
crm(live)configure# commit
　　
WARNING: webserver: default timeout 20s for start is smaller than the advised 100
　　
WARNING: webserver: default timeout 20s for stop is smaller than the advised 100
　　##提交有个警告不用管：
crm(live)configure# show　　
node 1: node1 \
　　
   attributes standby=off
　　
node 2: node2
　　
primitive vip IPaddr \
　　
   params ip=172.25.0
　　
primitive web systemd:nginx \
　　
   op monitor interval=30s timeout=100s \
　　
   op start timeout=100s interval=0 \
　　
   op stop timeout=100s interval=0
　　
property cib-bootstrap-options: \
　　
   have-watchdog=false \
　　
   dc-version=1.1.16-12.el7_4.4-94ff4df \
　　
   cluster-infrastructure=corosync \
　　
   cluster-name=mycluster \
　　
   stonith-enabled=false
　　##我们检测下已经有两个资源了：
crm(live)configure# cd　　
crm(live)# status
　　
Stack: corosync
　　
Current DC: node1 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
　　
Last updated: Sat Oct 14 21:20:59 2017
　　
Last change: Sat Oct 14 21:17:43 2017 by root via cibadmin on node1
　　
2 nodes configured
　　
2 resources configured
　　
Online: [ node1 node2 ]
　　
Full list of resources:
　　
webip(ocf::heartbeat:IPaddr):    Started node2
　　
webserver    (systemd:nginx):    Started node1
　　##我们也发现默认资源也是均衡了，但是我们发现不均衡了分配了资源，但是我们需要定义是一个组的，所以把两个资源加一组(为了实现高可用)
　　把两个添加到同个组里面：
crm(live)# configure　　
crm(live)configure# group webservice webip webserver ##添加 webservice webip在同个组里面
　　
crm(live)configure# verify
　　
crm(live)configure# commit
　　
crm(live)configure# cd ..
　　
crm(live)# status
　　
Stack: corosync
　　
Current DC: node1 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
　　
Last updated: Sat Oct 14 21:24:17 2017
　　
Last change: Sat Oct 14 21:24:12 2017 by root via cibadmin on node1
　　
2 nodes configured
　　
2 resources configured
　　
Online: [ node1 node2 ]
　　
Full list of resources:
　　
Resource Group: webservice
　　
webip    (ocf::heartbeat:IPaddr):    Started node1
　　
webserver(systemd:httpd):    Started node1             ##可以发现 webservice webip在同个组里面了
　　4、定义nfs资源：
　　查看文件系统类型
crm(live)ra# info ocf:heartbeat:Filesystem　　
device* (string): block device
　　
The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.
　　
directory* (string): mount point
　　
The mount point for the filesystem.
　　
fstype* (string): filesystem type
　　
The type of filesystem to be mounted.
　　###有三个必填项目
　　##开始配置
crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype="nfs" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=40s    ###定义/www 挂载到/usr/local/nginx/html下　　5、定义排列约束：
crm(live)configure# colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore)　　
crm(live)configure# verify
　　
WARNING: webserver_with_webstore_and_webip: resource webserver is grouped, constraints should apply to the group
　　
WARNING: webserver_with_webstore_and_webip: resource webip is grouped, constraints should apply to the group
　　
crm(live)configure# commit
　　##查看状态：
crm(live)configure# show　　
node 1: node1 \
　　
attributes standby=off
　　
node 2: node2
　　
primitive webip IPaddr \
　　
params ip=172.25.0.100
　　
primitive webserver systemd:nginx \
　　
op monitor interval=30s timeout=100s \
　　
op start timeout=60s interval=0 \
　　
op stop timeout=60s interval=0
　　
primitive webstore Filesystem \
　　
params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype=nfs \
　　
op start timeout=60s interval=0 \
　　
op stop timeout=60s interval=0 \
　　
op monitor interval=20s timeout=40s
　　
group webservice webip webserver
　　
colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore )
　　
property cib-bootstrap-options: \
　　
have-watchdog=false \
　　
dc-version=1.1.16-12.el7_4.4-94ff4df \
　　
cluster-infrastructure=corosync \
　　
cluster-name=mycluster \
　　
stonith-enabled=false \
　　6、定义执行顺序：
crm(live)configure# order webstore_after_webip Mandatory: webip webstore　　
crm(live)configure# verify
　　
crm(live)configure# order webserver_after_webstore Mandatory: webstore webserver
　　
crm(live)configure#
　　###查看一下状态
crm(live)# status　　
Stack: corosync
　　
Current DC: node1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
　　
Last updated: Wed Oct 25 20:46:41 2017
　　
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
　　
2 nodes configured
　　
3 resources configured
　　
Online: [ node1 node2 ]
　　
Full list of resources:
　　
Resource Group: webservice
　　
webip(ocf::heartbeat:IPaddr):Started node1
　　
webserver(systemd:nginx):Started node1
　　
webstore(ocf::heartbeat:Filesystem):Started node1
　　
##可以看到我们的顺序是webip webserver webstore
　　7、测试
# ip addr　　
1: lo:mtu 65536 qdisc noqueue state UNKNOWN qlen 1
　　
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
　　
inet 127.0.0.1/8 scope host lo
　　
   valid_lft forever preferred_lft forever
　　
inet6 ::1/128 scope host
　　
   valid_lft forever preferred_lft forever
　　
2: ens33:mtu 1500 qdisc pfifo_fast state UP qlen 1000
　　
link/ether 00:0c:29:49:e9:da brd ff:ff:ff:ff:ff:ff
　　
inet 172.25.0.29/24 brd 172.25.0.255 scope global ens33
　　
   valid_lft forever preferred_lft forever
　　
inet 172.25.0.100/24 brd 172.25.0.255 scope global secondary ens33
　　
   valid_lft forever preferred_lft forever
　　
inet6 fe80::20c:29ff:fe49:e9da/64 scope link
　　可以看到vip已经起来了
　　接下来访问web服务：
# curl 172.25.0.100　　
node
　　可以发现访问的是/www/index里的内用
　　我们把node1的pacemaker和corosync停掉
# systemctl stop pacemaker ##先关pacemaker先　　
# systemctl stop corosync
　　在node2上可以看到node2已经接管了
# crm　　
crm(live)# status
　　
Stack: corosync
　　
Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
　　
Last updated: Wed Oct 25 20:54:33 2017
　　
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
　　
2 nodes configured
　　
3 resources configured
　　
Online: [ node2 ]
　　
OFFLINE: [ node1 ]
　　
Full list of resources:
　　
Resource Group: webservice
　　
webip(ocf::heartbeat:IPaddr):Started node2
　　
webserver(systemd:nginx):Started node2
　　
webstore(ocf::heartbeat:Filesystem):Started node2
　　
crm(live)#exit
　　
# ip addr
　　
1: lo:mtu 65536 qdisc noqueue state UNKNOWN qlen 1
　　
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
　　
inet 127.0.0.1/8 scope host lo
　　
   valid_lft forever preferred_lft forever
　　
inet6 ::1/128 scope host
　　
   valid_lft forever preferred_lft forever
　　
2: ens33:mtu 1500 qdisc pfifo_fast state UP qlen 1000
　　
link/ether 00:0c:29:64:00:b1 brd ff:ff:ff:ff:ff:ff
　　
inet 172.25.0.30/24 brd 172.25.0.255 scope global ens33
　　
   valid_lft forever preferred_lft forever
　　
inet 172.25.0.100/24 brd 172.25.0.255 scope global secondary ens33
　　
   valid_lft forever preferred_lft forever
　　
inet6 fe80::20c:29ff:fe64:b1/64 scope link
　　##vip已经转移到node2上
# df -h　　
Filesystem       SizeUsed Avail Use% Mounted on
　　
/dev/mapper/cl-root 18G2.5G 16G14% /
　　
devtmpfs          226M 0226M 0% /dev
　　
tmpfs             237M 86M151M37% /dev/shm
　　
tmpfs             237M8.6M228M 4% /run
　　
tmpfs             237M 0237M 0% /sys/fs/cgroup
　　
/dev/sda1       1014M197M818M20% /boot
　　
tmpfs             48M 0 48M 0% /run/user/0
　　
172.25.0.29:/www    18G2.5G 16G14% /usr/local/nginx/html
　　###/www 也已经挂载到/usr/local/nginx/html下
# curl 172.25.0.100　　
node
　　###访问web资源也没问题了，说明实现成功
　　在node1上把pacemaker和corosync重启
# crm　　
crm(live)# status
　　
Stack: corosync
　　
Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
　　
Last updated: Wed Oct 25 21:00:40 2017
　　
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
　　
2 nodes configured
　　
3 resources configured
　　
Online: [ node1 node2 ]
　　
Full list of resources:
　　
Resource Group: webservice
　　
webip(ocf::heartbeat:IPaddr):Started node2
　　
webserver(systemd:nginx):Started node2
　　
webstore(ocf::heartbeat:Filesystem):Started node2
　　
crm(live)#
　　###可以看到node2已经接管了。
　　四、其他优化
　　如果设置抢占模式可以这样设
crm(live)configure# location nginx_in_node1 nginx inf: node1 ###位置绑定，慎用　　服务管理
crm(live)configure# propertymigration-limit=1    ###当本地服务停掉了，将会启动本地服务一次，如果起不来就换到另一主机的服务。　　crm更改文件
crm(live)# configure　　
crm(live)configure# edit ###会进入配置文件，模式相当于vim的模式
　　
node 1: node1 \
　　
   attributes standby=off
　　
node 2: node2
　　
primitive webip IPaddr \
　　
   params ip=172.25.0.100
　　
primitive webserver systemd:nginx \
　　
   op monitor interval=30s timeout=100s \
　　
   op start timeout=60s interval=0 \
　　
   op stop timeout=60s interval=0
　　
primitive webstore Filesystem \
　　
   params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype=nfs \
　　
   op start timeout=60s interval=0 \
　　
   op stop timeout=60s interval=0 \
　　
   op monitor interval=20s timeout=40s
　　
group webservice webip webserver
　　
order webserver_after_webstore Mandatory: webstore webserver
　　
colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore )
　　
order webstore_after_webip Mandatory: webip webstore
　　
property cib-bootstrap-options: \
　　
   have-watchdog=false \
　　
   dc-version=1.1.16-12.el7_4.4-94ff4df \
　　
   cluster-infrastructure=corosync \
　　
   cluster-name=mycluster \
　　
   stonith-enabled=false \
　　
   migration-limit=1
　　###可以看到刚刚配的内容，可以增删修改。
　　以上所有是我基于pacemaker+corosync实现nfs+nginx部署内容。

页: [1]

查看完整版本: 基于corosync+pacemaker实现nfs+nginx部署