高可用分布式存储(mfs+drbd+corosync+pacemaker)
MooseFS文件系统介绍MooseFS是一种分布式文件系统,MooseFS文件系统结构包括以下四种角色:1 管理服务器managing server (master)
2 元数据日志服务器Metalogger server(Metalogger)
3数据存储服务器data servers (chunkservers)
4 客户机挂载使用client computers
前面几篇构建了负载均衡,高可用,今天来实现一个高可用分布式存储,DRBD做主备切换,当主机故障时,会把数据转移到备机上,客户端通过VIP挂载到本地目录,mfschunkserver指向VIP做数据存储,从而达到数据的不间断。。。。
实验环境:
角色
IP域名
(Master)Mfs++drbd+pacemaker+corosync10.0.0.128zxb
(Backup)Mfs++drbd+pacemaker+corosync10.0.0.129zxb2
Mfschunkserver10.0.0.130 zxb3
Mfschunkserver10.0.0.131zxb4
Mfsclient
10.0.0.50client
实验拓扑
实施:配置前提:①时间同步# crontab -l
* * * * * ntpdate cn.pool.ntp.org# systemctl start crond
# crontab -l
* * * * * ntpdate cn.pool.ntp.org
# systemctl start crond
②主机名互相访问# cat /etc/hosts
10.0.0.128zxb2
10.0.0.129zxb3# cat /etc/hosts
10.0.0.128zxb2
10.0.0.129zxb3# hostnamectl set-hostname zxb
# hostnamectl set-hostname zxb2
③ssh免密钥# ssh-keygen
# ssh-copy-id 10.0.0.129
配置过程:① 配置DRBD1.首先安装drbd#cd /etc/yum.repos.d/#rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org#rpm -Uvhhttp://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm#yum install -y kmod-drbd84 drbd84-utils
#cd /etc/yum.repos.d/#rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org#rpm -Uvhhttp://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm#yum install -y kmod-drbd84 drbd84-utils
主配置文件/etc/drbd.conf #主配置文件/etc/drbd.d/global_common.conf#全局配置文件
查看主配置文件# cat/etc/drbd.conf # You can find anexample in /usr/share/doc/drbd.../drbd.conf.exampleinclude"drbd.d/global_common.conf";include"drbd.d/*.res";
2.修改全局配置文件,并说明# cat /etc/drbd.d/global_common.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
global {
usage-count no;#是否参加DRBD使用统计,默认为yes。官方统计drbd的装机量
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C; #使用DRBD的同步协议
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;reboot -f";
local-io-error"/usr/lib/drbd/notify-io-error.sh;/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;halt -f";
}
startup {
# wfc-timeout degr-wfc-timeoutoutdated-wfc-timeout wait-after-sb
}
options {
# cpu-mask on-no-data-accessible
}
disk {
on-io-error detach; #配置I/O错误处理策略为分离
# size max-bio-bvecs on-io-error fencingdisk-barrier disk-flushes
# disk-drain md-flushes resync-rate resync-afteral-extents
# c-plan-ahead c-delay-target c-fill-targetc-max-rate
# c-min-rate disk-timeout
}
net {
# protocol timeout max-epoch-sizemax-buffers unplug-watermark
# connect-int ping-int sndbuf-sizercvbuf-size ko-count
# allow-two-primaries cram-hmac-algshared-secret after-sb-0pri
# after-sb-1pri after-sb-2pri always-asbprr-conflict
# ping-timeout data-integrity-alg tcp-corkon-congestion
#congestion-fill congestion-extents csums-alg verify-alg
# use-rle
}
syncer {
rate 1024M; #设置主备节点同步时的网络速率
}
}
释: on-io-error 策略可能为以下选项之一detach 分离:这是默认和推荐的选项,如果在节点上发生底层的硬盘I/O错误,它会将设备运行在Diskless无盘模式下pass_on:DRBD会将I/O错误报告到上层,在主节点上,它会将其报告给挂载的文件系统,但是在此节点上就往往忽略(因此此节点上没有可以报告的上层)-local-in-error:调用本地磁盘I/O处理程序定义的命令;这需要有相应的local-io-error调用的资源处理程序处理错误的命令;这就给管理员有足够自由的权力命令命令或是脚本调用local-io-error处理I/O错误定义一个资源
3.分别在两台机器上各添加一块硬盘,并分区#fdisk /dev/sdb 设备 Boot Start End Blocks IdSystem
/dev/sdb1 2048 2099199 1048576 8eLinux LVM#fdisk /dev/sdb 设备Boot Start End Blocks IdSystem
/dev/sdb1 2048 2099199 10485768eLinux LVM
4.创建配置文件# cat /etc/drbd.d/mysql.res
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
resource mysql { #资源名称
protocol C; #使用协议
meta-disk internal;
device /dev/drbd1;#DRBD设备名称
syncer {
verify-alg sha1;#加密算法
}
net {
allow-two-primaries;
}
on zxb {
disk /dev/sdb1; #drbd1使用的磁盘分区为"mysql"
address 10.0.0.128:7789;
}
on zxb2 {
disk /dev/sdb1;
address 10.0.0.129:7789;}
}
5.然后把配置文件copy到对面的机器上# scp -rp /etc/drbd.d/* zxb2:/etc/drbd.d/
6.在zxb上面启动#drbdadm create-md mysqlinitializing activity loginitializing bitmap (160 KB) to all zeroWriting meta data...New drbd meta data block successfullycreated.# modprobe drbd#查看内核是否已经加载了模块:#drbdadm up mysql#drbdadm -- --force primary mysql# cat /proc/drbd#查看状态
1
2
3
4
5
6
version: 8.4.10-1 (api:1/proto:86-101)
GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@,2017-09-15 14:23:22
1: cs:Connected ro
Primary/Unknow
ds:UpToDate/UnknownC r-----
ns:2136 nr:1099935 dw:1102071 dr:12794 al:11 bm:0 lo:0 pe:0ua:0 ap:0 ep:1 wo:f oos:0
7.在对端节点(zxb2)执行root@zxb2~]#drbdadm create-md mysq# modprobe drbd #drbdadm up mysql
# cat /proc/drbd #同步完数据后,查看状态
1
2
3
4
5
6
version: 8.4.10-1 (api:1/proto:86-101)
GIT-hash: a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@,2017-09-15 14:23:22
1: cs:Connected ro:
Secondary/Primary
ds:UpToDate/UpToDate C r-----
ns:1099935 nr:2136 dw:53563 dr:1052363 al:18 bm:0 lo:0 pe:0ua:0 ap:0 ep:1 wo:f oos:0
8.格式化drbd # mkfs.xfs /dev/drbd1
② .配置pacemaker+corosync1.两个节点配置所需的安装包# yum install -y pacemaker pcs psmisc policycoreutils-python
# yum install -y pacemaker pcs psmisc policycoreutils-python
2.启动pcs并设置开机启动
# systemctl start pcsd.service
# systemctl enable pcsd.service
#systemctl start pcsd.service
#systemctl enable pcsd.service
3.修改hacluster的密码(用户装完软件后默认生成)
# echo 1 | passwd --stdin hacluster
# echo 1 | passwd --stdin hacluster
接下来在某一节点上执行
4.注册pcs集群主机(用户名密码在上)
# pcs cluster auth zxb zxb2
zxb: authorized
zxb2: authorized
5.在集群上创建两台集群
# pcs cluster setup --name mycluster zxb zxb2 --force
6.查看生成的corosync配置文件
# ls /etc/corosync/
corosync.confcorosync.conf.examplecorosync.conf.example.udpucorosync.xml.exampleuidgid.d
7.查看corosync配置文件是否有刚刚注册的文件
# cat corosync.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
totem {
version: 2
secauth: off
cluster_name: mycluster
transport: udpu
}
nodelist {
node {
ring0_addr: zxb2
nodeid: 1
}
node {
ring0_addr: zxb3
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
}
8.启动集群
# pcs cluster start --all
zxb: Starting Cluster...
zxb2: Starting Cluster...
##相当于启动pacemaker和corosync
# ps -ef |grep pacemaker
1
2
3
4
5
6
7
8
root 1757 10 14:54 ? 00:00:02 /usr/sbin/pacemakerd -f
haclust+ 1758 17570 14:54 ? 00:00:50 /usr/libexec/pacemaker/cib
root 1759 17570 14:54 ? 00:00:01 /usr/libexec/pacemaker/stonithd
root 1760 17570 14:54 ? 00:00:10 /usr/libexec/pacemaker/lrmd
haclust+ 1761 17570 14:54 ? 00:00:15 /usr/libexec/pacemaker/attrd
haclust+ 1762 17570 14:54 ? 00:00:21 /usr/libexec/pacemaker/pengine
haclust+ 1763 17570 14:54 ? 00:00:19 /usr/libexec/pacemaker/crmd
root 35211344000 17:59 pts/0 00:00:00 grep --color=auto pacemaker
# ps -ef |grep corosync
1
2
root 1750 12 14:54 ? 00:05:00 corosync
root 35213344000 17:59 pts/0 00:00:00 grep --color=auto corosync
9.查看集群状态(显示no faults为正常)
# corosync-cfgtool -s
1
2
3
4
5
6
Printing ring status.
Local node ID 1
RING ID 0
id = 10.0.0.128
status = ring 0 active with
no faults
# corosync-cfgtool -s
1
2
3
4
5
6
Printing ring status.
Local node ID 2
RING ID 0
id = 10.0.0.129
status = ring 0 active with
no faults
10.安装crmsh来管理集群(从github来下载,然后解压直接安装):两个节点都安装,便于管理
# ls
crmsh-2.3.2.tar
# tar xf crmsh-2.3.2.tar
#cd crmsh-2.3.2
# python setup.py install
11.查看状态
1
2
3
4
5
6
7
8
9
10
11
crm(live)#
status
Stack: corosync
Current DC: zxb (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Fri Oct 27 18:15:49 2017
Last change: Fri Oct 27 18:15:39 2017 by root via cibadmin on zxb
2 nodes configured
0 resources configured
Online: [ zxb zxb2 ]
No resources
crm(live)#
③配置moosefs(mfs)
1.在master上配置
创建mfs用户(注意接下来几台的用户uid和gid要一致)
# useradd mfs
# id mfs
uid=1002(mfs) gid=1002(mfs) 组=1002(mfs)
创建安装目录,并赋予权限
# mkdir /usr/local/mfs
# chown -R mfs:mfs /usr/local/mfs/
下载依赖包和安装包
# cd /usr/local/src
# yum install zlib-devel -
# wget https://github.com/moosefs/moosefs/archive/v3.0.96.tar.gz
解压并编译
# tar xf v3.0.96.tar.gz
# cd moosefs-3.0.96/
#./configure --prefix=/usr/local/mfs --with-default-user=mfs --with-default-group=mfs --disable-mfschunkserver --disable-mfsmount
# make && make install
配置master,将所需要的sample文件命名为cfg文件
# cd /usr/local/mfs/etc/mfs/
# ls
mfsexports.cfg.samplemfsmaster.cfg.samplemfsmetalogger.cfg.samplemfstopology.cfg.sample
# cp mfsexports.cfg.sample mfsexports.cfg
# cp mfsmaster.cfg.sample mfsmaster.cfg
修改控制文件mfsexports.cfg,默认使用mfsmaster.cfg.
# cat mfsexports.cf
* / rw,alldirs,mapall=mfs:mfs,password=1
* . rw
##mfsexports.cfg 文件中,每一个条目就是一个配置规则,而每一个条目又分为三个部分,其中第一部分是mfs客户端的ip地址或地址范围,第二部分是被挂载的目录,第三个部分用来设置mfs客户端可以拥有的访问权限。
开启元数据,默认是empty,需要手工开启
# cd /usr/local/mfs/var/mfs/
# cp metadata.mfs.empty metadata.mfs
启动master
# /usr/local/mfs/sbin/mfsmaster
1
2
3
4
5
6
7
8
9
10
11
12
13
open files limit has been set to: 16384
working directory: /usr/local/mfs/var/mfs
lockfile created and locked
initializing mfsmaster modules ...
exports file has been loaded
mfstopology configuration file (/usr/local/mfs/etc/mfstopology.cfg) not found - using defaults
loading metadata ...
metadata file has been loaded
no charts data file - initializing empty charts
master <-> metaloggers module: listen on *:9419
master <-> chunkservers module: listen on *:9420
main master server module: listen on *:9421
mfsmaster daemon initialized properly
##检查进程和端口是否启动
# ps -ef |grep mfs
1
2
mfs 44608 11 18:44 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster
root 44630344000 18:45 pts/0 00:00:00 grep --color=auto mfs
# netstat -ntlp
1
2
3
4
5
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:9419 0.0.0.0:* LISTEN 44608/mfsmaster
tcp 0 0 0.0.0.0:9420 0.0.0.0:* LISTEN 44608/mfsmaster
tcp 0 0 0.0.0.0:9421 0.0.0.0:* LISTEN 44608/mfsmaster
配置mfsmaster启动脚本,有启动脚本接管,并设置开机启动
# cat /etc/systemd/system/mfsmaster.service
1
2
3
4
5
6
7
8
9
10
11
12
Description=mfs
After=network.target
Type=forking
ExecStart=/usr/local/mfs/sbin/mfsmaster start
ExecStop=/usr/local/mfs/sbin/mfsmaster stop
PrivateTmp=true
WantedBy=multi-user.target
# /usr/local/mfs/sbin/mfsmaster stop
sending SIGTERM to lock owner (pid:44899)
waiting for termination terminated
# systemctl enable mfsmaster
Created symlink from /etc/systemd/system/multi-user.target.wants/mfsmaster.service to /etc/systemd/system/mfsmaster.service.
配置高可用
配置资源
# crm
1
2
3
4
5
6
crm(live)# configure
crm(live)configure# primitive mfs_drbd ocf:linbit:drbd params drbd_resource=mysql op monitor role=Master interval=10 timeout=20 op monitor role=Slave interval=20 timeout=20 op start timeout=240 op stop timeout=100
crm(live)configure# verify
crm(live)configure# ms ms_mfs_drbd mfs_drbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)configure# verify
crm(live)configure# commit
配置挂载资源
1
2
crm(live)configure# primitive mystore ocf:heartbeat:Filesystem params device=/dev/drbd1 directory=/usr/local/mfs fstype=xfs op start timeout=60 op stop timeout=60
crm(live)configure# verify
给drbd和挂载资源做亲缘绑定,并做顺序约束(当drbd起来之后进行挂载)
1
2
3
crm(live)configure# colocation mfs_drbd_with_mystore inf: ms_mfs_drbd mystore
crm(live)configure# order ms_mfs_drbd_before_mystore Mandatory: ms_mfs_drbd mystore:start
crm(live)configure# verify
配置mfs资源
1
crm(live)configure# primitive mfs systemd:mfsmaster op monitor timeout=100 interval=100 op start timeout=100 interval=0 op stop time
给mfs和挂载资源做亲缘绑定,并做顺序约束(当挂载后启动mfs服务)
1
2
3
4
crm(live)configure# colocation mfs_with_mystore inf: mystore mfs
crm(live)configure# order mystore_with_mfs Mandatory: mystore mfs
crm(live)configure# verify
crm(live)configure# commit
配置VIP,将vip和绑定一起
1
2
3
4
5
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=10.0.0.250
crm(live)configure# colocation vip_with_mfs inf: vip mfs
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
node 1: zxb \
attributes standby=off
node 2: zxb2 \
attributes standby=off
primitive mfs systemd:mfsmaster \
op monitor timeout=100 interval=30 \
op start timeout=100 interval=0 \
op stop timeout=100 interval=0
primitive mfs_drbd ocf:linbit:drbd \
params drbd_resource=mysql \
op monitor role=Master interval=10 timeout=20 \
op monitor role=Slave interval=20 timeout=20 \
op start timeout=240 interval=0 \
op stop timeout=100 interval=0
primitive mystore Filesystem \
params device="/dev/drbd1" directory="/usr/local/mfs" fstype=xfs \
op start timeout=60 interval=0 \
op stop timeout=60 interval=0
primitive vip IPaddr \
params ip=10.0.0.250
ms ms_mfs_drbd mfs_drbd \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
colocation mfs_with_mystore inf: mystore mfs
order ms_mfs_drbd_before_mystore Mandatory: ms_mfs_drbd:promote mystore:start
colocation ms_mfs_drbd_with_mystore inf: mystore ms_mfs_drbd
order mystore_with_mfs Mandatory: mystore mfs
colocation vip_with_mfs inf: vip mfs
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-12.el7_4.4-94ff4df \
cluster-infrastructure=corosync \
cluster-name=mycluster \
stonith-enabled=false \
no-quorum-policy=ignore \
migration-limit=1
2.backup,drbd会同步数据,所以只需要创建用户、目录,和启动脚本
# id mfs
uid=1002(mfs) gid=1002(mfs) 组=1002(mfs
# mkdir /usr/local/mfs
# chown -R mfs:mfs /usr/local/mfs/
# cat /etc/systemd/system/mfsmaster.service
1
2
3
4
5
6
7
8
9
10
11
12
Description=mfs
After=network.target
Type=forking
ExecStart=/usr/local/mfs/sbin/mfsmaster start
ExecStop=/usr/local/mfs/sbin/mfsmaster stop
PrivateTmp=true
WantedBy=multi-user.target
3.配置mfschunkserver(两台配置一样zxb3.zxb4)
# tar xf v3.0.96.tar.gz
# cd moosefs-3.0.96/
# ./configure --prefix=/usr/local/mfs --with-default-user=mfs --with-default-group=mfs--disable-mfsmaster --disable-mfsmount
# make && make install
修改MASTER_HOST指向VIP
# vim mfschunkserver.cfg
MASTER_HOST = 10.0.0.250
配置mfsshdd.cfg
# mkdir /mfstest
# chown -R mfs:mfs /mfstest
# cp mfshdd.cfg.sample mfshdd.cf
# vim mfshdd.cfg
/mfstest
#mfshdd.cfg该文件用来设置你将 Chunk Server 的哪个目录共享出去给 Master Server进行管理。当然,虽然这里填写的是共享的目录,但是这个目录后面最好是一个单独的分区。
启动服务
# /usr/local/mfs/sbin/mfschunkserver start
1
2
3
4
5
6
7
8
9
10
11
open files limit has been set to: 16384
working directory: /usr/local/mfs/var/mfs
lockfile created and locked
setting glibc malloc arena max to 4
setting glibc malloc arena test to 4
initializing mfschunkserver modules ...
hdd space manager: path to scan: /mfstest/
hdd space manager: start background hdd scanning (searching for available chunks)
main server module: listen on *:9422
no charts data file - initializing empty charts
mfschunkserver daemon initialized properly
# netstat -ntlp |grep mfs
tcp 0 0 0.0.0.0:9422 0.0.0.0:* LISTEN 44473/mfschunkserve
4.客户端配置
安装FUSH
# yum install fuse fuse-devel
# lsmod|grep fuse
fuse 798922
安装挂载客户端
# id mfs
uid=1002(mfs) gid=1002(mfs) 组=1002(mfs)
# tar xf v3.0.96.tar.gz
# cd moosefs-3.0.96/
#./configure --prefix=/usr/local/mfs --with-default-user=mfs --with-default-group=mfs --disable-mfsmaster --disable-mfschunkserver --enable-mfsmount
创建挂载目录
# mkdir /mfstest
# chown -R mfs:mfs /mfstest/
测试:
在客户端上挂载文件系统,远端IP为vip
# /usr/local/mfs/bin/mfsmount /mfstest/ -H 10.0.0.250 -p
MFS Password:
mfsmaster accepted connection with parameters: read-write,restricted_ip,admin ; root mapped to root:root
# df -h
Filesystem SizeUsed Avail Use% Mounted on
/dev/mapper/vg_zxb4-lv_root
26G3.5G 22G14% /
tmpfs 931M 0931M 0% /dev/shm
/dev/sda1 477M 28M425M 7% /boot
10.0.0.250:9421 34G5.7G 29G17% /mfstest
写入本地文件测试
# echo "master" >/mfstest/1.txt
# cat /mfstest/1.txt
master
查看master端的变化
# pwd
/usr/local/mfs/var/mfs
# cat changelog.0.mfs
1
2
3
4
5
6
7
8
9
10
10: 1509166409|ACCESS(1)
11: 1509166411|
CREATE(1,1.txt,
1,438,18,0,0,0):4
12: 1509166411|ACQUIRE(4,4)
13: 1509166411|WRITE(4,0,1,0):7
14: 1509166411|LENGTH(4,7,0)
15: 1509166411|UNLOCK(7)
16: 1509166414|ACCESS(1)
17: 1509166415|AMTIME(4,1509166414,1509166411,1509166411)
当停止master时,数据会转移到backup,并测试
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
crm(live)#
node standby
crm(live)#
status
Stack: corosync
Current DC: zxb (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Sat Oct 28 21:37:02 2017
Last change: Sat Oct 28 21:37:01 2017 by root via crm_attribute on zxb
2 nodes configured
5 resources configured
Node zxb: standby
Online: [ zxb2 ]
Full list of resources:
Master/Slave Set: ms_mfs_drbd
Masters: [ zxb2 ]
Stopped: [ zxb ]
mystore (ocf::heartbeat:Filesystem): Started zxb2
mfs (systemd:mfsmaster): Started zxb2
vip (ocf::heartbeat:IPaddr): Started zxb2
# echo "2.txt" >/mfstest/2.txt
# pwd
/usr/local/mfs/var/mfs
# cat changelog.0.mfs
1
2
3
4
5
6
7
26: 1509200272|CREATE(1,2.txt,1,438,18,1002,1002,0):3
27: 1509200272|ACQUIRE(5,3)
28: 1509200274|WRITE(3,0,1,0):2
29: 1509200277|LENGTH(3,6,0)
30: 1509200277|UNLOCK(2)
31: 1509200329|SESDISCONNECTED(5)
32: 1509200445|SESDISCONNECTED(5)
终于结束了。。。。。。。。
页:
[1]