logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
查看服务启动状态:
# crm_mon
Last updated: Tue Dec 8 03:58:37 2015
Last change: Tue Dec 8 03:58:27 2015
Stack: corosync
Current DC: controller3 (167772173) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
0 Resources configured
------------------------------------------------------------------------------------------------------
/etc/corosync/corosync.conf
文件中的token含义是:
The token value specifies the time, in milliseconds, during which the Corosync token is expected to be transmitted around the ring. When this timeout expires, the token is declared lost, and after token_retransmits_before_loss_const lost tokens, the non-responding processor (cluster node) is declared dead. In other words, token × token_retransmits_before_loss_const is the maximum time a node is allowed to not respond to cluster messages before being considered dead. The default for token is 1000 milliseconds (1 second), with 4 allowed retransmits. These defaults are intended to minimize failover times, but can cause frequent “false alarms” and unintended failovers in case of short network interruptions. The values used here are safer, albeit with slightly extended failover times.
节点通讯有两种方式,一种是多播,一种是单播。我在测试中采用的是单播,即指定了nodelist。
多播方式配置:
totem {
version: 2
# Time (in ms) to wait for a token (1)
token: 10000
# How many token retransmits before forming a new
# configuration
token_retransmits_before_loss_const: 10
# Turn off the virtual synchrony filter
vsftype: none
# Enable encryption (2)
secauth: on
# How many threads to use for encryption/decryption
threads: 0
# This specifies the redundant ring protocol, which may be
# none, active, or passive. (3)
rrp_mode: active
# The following is a two-ring multicast configuration. (4)
interface {
ringnumber: 0
bindnetaddr: 192.168.42.0
mcastaddr: 239.255.42.1
mcastport: 5405
}
interface {
ringnumber: 1
bindnetaddr: 10.0.42.0
mcastaddr: 239.255.42.2
mcastport: 5405
}
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager (5)
ver: 1
name: pacemaker
}
votequorum方式配置
votequorum库是Corosync项目中的一部分。采用votequorum是为了避免脑裂发生,以及:
1、查询quorum状态;
2、获得quorum服务所知道的节点列表;
3、接收quorum状态改变的通知;
4、改变votes的数量,并分配域一个节点(Change the number of votes assigned to a node)
5、Change the number of expected votes for a cluster to be quorate
6、Connect an additional quorum device to allow small clusters remain quorate during node outages
votequorum库被创建于用来替换和取代qdisk(表决盘)。
采用了votequorum配置的corosync.com如下:
quorum {
provider: corosync_votequorum (1)
expected_votes: 7 (2)
wait_for_all: 1 (3)
last_man_standing: 1 (4)
last_man_standing_window: 10000 (5)
}
注意:
corosync_votequorum启动了votequorum。
expected_votes为7表示,7个节点,quorum为4。如果设置了nodelist参数,expected_votes无效。
wait_for_all值为1表示,当集群启动,集群quorum被挂起,直到所有节点在线并加入集群,这个参数是Corosync 2.0新增的。
last_man_standing为1表示,启用LMS特性。默认这个特性是关闭的,即值为0。这个参数开启后,当集群的处于表决边缘(如expected_votes=7,而当前online nodes=4),处于表决边缘状态超过last_man_standing_window参数指定的时间,则重新计算quorum,直到online nodes=2。如果想让online nodes能够等于1,必须启用auto_tie_breaker选项,生产环境不推荐。
last_man_standing_window单位为毫秒。在一个或多个主机从集群中丢失后,重新计算quorum。