MySQL MHA高可用环境部署

踏雪寻梅 发表于 2018-10-4 11:11:36

　　一，安装MHA基本环境
　　安装MHA节点
　　（1）基本环境说明
　　角色IP地址主机名
　　=========================================
　　主机192.168.1.121节点1
　　从机192.168.1.122节点2
　　从机192.168.1.123节点3
　　监视主机192.168.1.125节点5
　　（2）在node1，node2，node3，node5中操作：
　　＃vi / etc / hosts
　　192.168.1.121 node1
　　192.168.1.122 node2
　　192.168.1.123 node3
　　192.168.1.125 node5
　　安装MHA节点节点软件包：
　　＃rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm＃yum
　　install perl-DBD-MySQL perl-CPAN -y
　　＃tar xf mha4mysql -node-0.56.tar.gz
　　＃cd mha4mysql-node-0.56
　　＃perl Makefile.PL
　　＃make && make install
　　安装MHA
　　在节点5管理节点上操作：注：MHA管理器主机也是需要安装MHA节点，MHA管理器
　　＃yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-time-HiRes -y
　　＃tar xf mha4mysql-manager-0.56.tar.gz
　　＃cd mha4mysql-manager -0.56
　　＃perl Makefile.PL
　　＃make && make install
　　＃说明：安装的脚本程序都在/ usr / local / bin /目录下。
　　3.节点间配置SSH登录无密码验证（MHA主机之间使用密钥登录）
　　在node5（Monitor）中：
　　＃ssh-keygen -t rsa
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node1
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node2
　　＃ssh- copy-id -i /root/.ssh/id_rsa.pub root @ node3
　　在node1（Master）中：
　　＃ssh-keygen -t rsa
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node2
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node3
　　＃ssh- copy-id -i /root/.ssh/id_rsa.pub root @ node5
　　在node2（slave）中：
　　＃ssh-keygen -t rsa
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node1
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node3
　　＃ssh- copy-id -i /root/.ssh/id_rsa.pub root @ node5
　　在node3（slave）中：
　　＃ssh-keygen -t rsa
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node1
　　＃ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node2
　　＃ssh- copy-id -i /root/.ssh/id_rsa.pub root @ node5
　　二，搭建主从复制环境
　　主从复制环境配置过程
　　（1）mysql安装过程略，但是三节点要创建如下链接
　　node1（主），node2（主备从），node3（从）
　　注意：创建如下链接：
　　ln -s / usr / local / mysql / bin / * / usr / local / bin /
　　node1 my.cnf
　　server-id = 1
　　binlog-format = ROW
　　log-bin = master-bin
　　log-bin-index = master-bin.index
　　log-slave-updates = true
　　relay_log_purge = 0
　　node2 my.cnf
　　server-id = 2
　　binlog-format = ROW
　　log-bin = master-bin
　　log-bin-index = master-bin.index
　　log-slave-updates = true
　　relay_log_purge = 0
　　node3 my.cnf
　　binlog-format = ROW
　　log-bin = mysql-bin
　　relay-log = slave-relay-bin
　　relay-log-index = slave-relay-bin.index
　　log-slave-updates = true
　　server-id = 11
　　skip-name- resolve
　　relay_log_purge = 0
　　（2）在node1（Master）上备份一份完整的数据：
　　＃mysqldump -uroot -p123456 --master-data = 2 - 单事务-R - triggers -A> all.sql
　　其中--master-data = 2代表备份时刻记录主的Binlog位置和位置。
　　（3）在node1（Master）上创建复制用户：
　　mysql>授予复制从机*。*到'123456'确定的'repl'@'192.168.1.%';
　　刷新权限;
　　（4）查看主库备份时的binlog名称和位置，MASTER_LOG_FILE和MASTER_LOG_POS：
　　＃head -n 30 all.sql | grep'CHANGE MASTER TO'
　　- CHANGE MASTER TO MASTER_LOG_FILE ='master-bin.000004'，MASTER_LOG_POS = 120;
　　（5）把备份复制到192.168.1.122和192.168.1.123
　　＃scp all.sql
　　192.168.1.122:/root/＃scp all.sql 192.168.1.123:/root/
　　（6）分别在两台服务器上导入备份，执行复制相关命令
　　在node2，node3主机上操作：
　　＃mysql -uroot -p123456start slave;
　　显示从属状态\ G
　　创建MHA管理用户，在主上创建。
　　将*。*的所有权限授予'123456'标识为'root'@'192.168.1.%'的权限;
　　刷新权限;
　　三，配置Keepal VIP
　　vip配置可以采用两种方式，一种通过keepalived的方式管理虚拟ip的浮动;一人是通过脚本方式，本文通过keepalived方式实现
　　1.在node1（Master）与node2（备选主节点）安装keepalived。
　　＃wget的http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
　　＃焦油XF的keepalived-1.2.12.tar.gz
　　＃CD的keepalived-1.2.12
　　＃的./configure前缀= / usr / local / keepalived
　　＃make && make install
　　＃cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/＃cp
　　/ usr / local / keepalived / etc / sysconfig / keepalived / etc / sysconfig /
　　＃mkdir / etc / keepalived
　　＃cp /usr/local/keepalived/etc/keepalived/keepalived.conf / etc / keepalived /
　　＃cp / usr / local / keepalived / sbin / keepalived / usr / sbin /
　　配置keepalived的配置文件，在node1（master）上配置操作如下：
　　注意：keepalived配置成备份 - 备份，即IP地址切换后，主起来后IP地址不切换，本文监控脚本由MHA提供，keepalived不提供对mysqld的监控。
　　＃vi /etc/keepalived/keepalived.conf
　　！保持配置文件
　　global_defs {
　　notification_email {
　　abc@163.com
　　}
　　notification_email_from dba@dbserver.com
　　smtp_server 127.0.0.1
　　smtp_connect_timeout 30
　　router_id MySQL-HA
　　}
　　vrrp_instance VI_1 {
　　state BACKUP
　　interface eth0
　　virtual_router_id 51
　　priority 150
　　advert_int 1
　　nopreempt
　　身份验证{
　　auth_type PASS
　　auth_pass 1111
　　}
　　virtual_ipaddress {
　　192.168.1.130
　　}
　　}
　　配置keepalived的配置文件，在node2（备用节点）上配置操作如下：
　　＃vi /etc/keepalived/keepalived.conf
　　！保持配置文件
　　global_defs {
　　notification_email {
　　abc@163.com
　　}
　　notification_email_from dba@dbserver.com
　　smtp_server 127.0.0.1
　　smtp_connect_timeout 30
　　router_id MySQL-HA
　　}
　　vrrp_instance VI_1 {
　　state BACKUP
　　interface eth0
　　virtual_router_id 51
　　priority 120
　　advert_int 1
　　nopreempt
　　身份验证{
　　auth_type PASS
　　auth_pass 1111
　　}
　　virtual_ipaddress {
　　192.168.1.130
　　}
　　}
　　4. node1，node2启动keepalived服务
　　＃service keepalived start
　　＃chkconfig keepalived on
　　5. node1查看VIP启动情况
　　[root @
　　node1]＃ip a 1：lo： mtu 65536 qdisc noqueue state UNKNOWN
　　link / loopback 00：00：00：00：00：00 brd 00：00：00：00：00 ：00
　　inet 127.0.0.1/8 scope host lo
　　inet6 :: 1/128 scope host
　　valid_lft forever preferred_lft forever
　　2：eth0： mtu 1500 qdisc pfifo_fast state UP qlen 1000
　　link / ether 00：0c ：29：4e：53：71 brd ff：ff：ff：ff：ff：ff
　　inet 192.168.1.121/24 brd 192.168.1.255范围全局eth0
　　inet 192.168.1.130/32范围全局eth0
　　inet6 fe80 :: 20c：29ff： fe4e：5371/64范围链接
　　valid_lft永远preferred_lft永远
　　四，配置MHA
　　监控创建MHA的工作目录，并且创建相关配置文件（在软件包解压后的目录里面有样例配置文件）。
　　＃mkdir -p / etc / masterha
　　＃mkdir -p / var / log / masterha / app1
　　＃cp mha4mysql-manager-0.56 / samples / conf / app1.cnf / etc / masterha /
　　修改app1.cnf配置文件，修改后的文件内容如下：
　　＃cat /etc/masterha/app1.cnf
　　
　　manager_workdir = / var / log / masterha / app1
　　manager_log = / var / log / masterha / app1 / manager.log
　　master_binlog_dir = / usr / local / mysql / data /
　　master_ip_failover_script = / usr / local / bin / master_ip_failover
　　master_ip_online_change_script = / usr / local / bin / master_ip_online_change
　　password = 123456
　　user = root
　　ping_interval = 1
　　remote_workdir = / tmp
　　repl_password = 123456
　　repl_user = repl
　　report_script = / usr / local / bin / send_report
　　ssh_user = root
　　
　　hostname = 192.168.1.121
　　port = 3306
　　
　　hostname = 192.168.1.122
　　port = 3306
　　candidate_master = 1
　　check_repl_delay = 0
　　
　　hostname = 192.168.1.123
　　port = 3306
　　说明：
　　master_ip_failover_script = / usr / local / bin / master_ip_failover          #MHA自动切换执行的脚本，需要修改
　　master_ip_online_change_script = / usr / local / bin / master_ip_online_change ＃手动
　　切换需要执行的脚本，需要修改 report_script = / usr / local / bin / send_report ＃切换                            时发送邮件进行报告，需要修改
　　2.设置中继日志的清除方式（在每个从节点上）：
　　（1）在节点2，节点3从节点上操作：
　　将relay_log_purge = 0加入my.cnf配置文件，前面已经配置。
　　（2）设定定期清理继电器脚本（node2，node3上操作）：
　　＃猫purge_relay_log.sh
　　＃！/斌/ bash的
　　用户=根
　　的passwd = 123456
　　端口= 3306
　　LOG_DIR = '/数据/ masterha /日志'
　　WORK_DIR = '/数据'
　　清除= '在/ usr / local / bin目录/ purge_relay_logs'
　　如果[！-d $ log_dir]
　　then
　　mkdir $ log_dir -p
　　fi
　　$ purge --user = $ user --password = $ passwd --disable_relay_log_purge --port = $ port --workdir = $ work_dir >> $ log_dir / purge_relay_logs.log 2>＆1
　　配置定时计划任务
　　＃crontab -e 0 4 * * * / bin / bash /root/purge_relay_log.sh
　　要求把keepalived服务引入MHA，我们只需要修改切换是触发的脚本文件master_ip_failover即可，在该脚本中添加在master发生宕机时对keepalived的处理。
　　（1）编辑脚本/ usr / local / bin / master_ip_failover，修改后如下：
　　＃vi / usr / local / bin / master_ip_failover
　　＃！/ usr / bin / env perl
　　use strict;
　　使用警告FATAL =>'all';
　　使用Getopt :: Long;
　　我的（
　　$命令，$ ssh_user，$ orig_master_host，$ orig_master_ip，
　　$ orig_master_port，$ new_master_host，$ new_master_ip，$ new_master_port
　　）;
　　我的$ vip ='192.168.1.130';
　　我的$ ssh_start_vip =“/etc/init.d/keepalived start”;
　　我的$ ssh_stop_vip =“/etc/init.d/keepalived stop”;
　　GetOptions（
　　'command = s'=> \ $ command，
　　'ssh_user = s'=> \ $ ssh_user，
　　'orig_master_host = s'=> \ $ orig_master_host，
　　'orig_master_ip = s'=> \ $ orig_master_ip，
　　'orig_master_port = '=> \ $ orig_master_port，
　　'new_master_host = s'=> \ $ new_master_host，
　　'new_master_ip = s'=> \ $ new_master_ip，
　　'new_master_port = i'=> \ $ new_master_port，
　　）;
　　exit＆main（）;
　　sub main {
　　print“\ n \ nIN SCRIPT TEST ==== $ ssh_stop_vip == $ ssh_start_vip === \ n \ n”;
　　if（$ command eq“stop”|| $ command eq“stopssh”）{
　　my $ exit_code = 1;
　　eval {
　　print“禁用旧主机上的VIP：$ orig_master_host \ n”;
　　＆stop_vip（）;
　　$ exit_code = 0;
　　};
　　if（$ @）{
　　warn“Got Error：$ @ \ n”;
　　退出$ exit_code;
　　}
　　exit $ exit_code;
　　}
　　elsif（$ command eq“start”）{
　　我的$ exit_code = 10;
　　eval {
　　print“启用VIP - $ vip on the new master - $ new_master_host \ n”;
　　＆start_vip（）;
　　$ exit_code = 0;
　　};
　　if（$ @）{
　　warn $ @;
　　退出$ exit_code;
　　}
　　exit $ exit_code;
　　} {
　　el } {
　　“ } ”{ “ } ”\“”
　　退出0;
　　}
　　else {
　　＆usage（）;
　　出口1;
　　}
　　}
　　sub start_vip（）{
　　`ssh $ ssh_user \ @ $ new_master_host \“$ ssh_start_vip \”`;
　　}
　　＃一个简单的系统调用，禁用在old_master
　　子上的VIP stop_vip（）{
　　`ssh $ ssh_user \ @ $ orig_master_host \“$ ssh_stop_vip \”`;
　　}}
　　sub usage {
　　print
　　“用法：master_ip_failover --command = start | stop | stopssh | status --orig_master_host = host --orig_master_ip = ip --orig_master_port = port --new_master_host = host --new_master_ip = ip --new_master_port = port \ n“;
　　}
　　（2）编辑脚本master_ip_online_change，修改后如下：
　　＃！/ usr / bin / env perl
　　＃版权所有（C）2011 DeNA有限公司
　　＃＃
　　这个程序是免费的软件; 您可以
　　根据
　　＃自由软件基金会发布的GNU通用公共许可证的条款重新分配和/或修改＃许可证的版本2或
　　＃（根据您的选择）任何更高版本。
　　＃＃
　　这个程序是分发的，希望它是有用的，
　　但没有任何的保证; 甚至没有
　　＃适销性或适用于特定用途的默示保证。有关
　　详细信息，请参阅＃GNU通用公共许可证。
　　＃＃
　　您应该已经收到了GNU通用公共许可证
　　＃的副本以及该程序; 如果不，
　　##注意：这是一个示例脚本，不完整。根据您的环境修改脚本。
　　使用严格
　　使用警告FATAL =>'all';
　　使用Getopt :: Long;
　　使用MHA :: DBHelper;
　　使用MHA :: NodeUtil;
　　使用Time :: HiRes qw（sleep gettimeofday tv_interval）;
　　使用Data :: Dumper;
　　我的$ _tstart;
　　我的$ _running_interval = 0.1;
　　我的（
　　$命令，$ orig_master_is_new_slave，$ orig_master_host，
　　$ orig_master_ip，$ orig_master_port，$ orig_master_user，
　　$ orig_master_password，$ orig_master_ssh_user，$ new_master_host，
　　$ new_master_ip，$ new_master_port，$ new_master_user，
　　$ new_master_password，$ new_master_ssh_user
　　）
　　我的$ vip ='192.168.1.130/24';
　　我的$ key ='1';
　　我的$ ssh_start_vip =“/ sbin / ifconfig eth0：$ key $ vip”;
　　我的$ ssh_stop_vip =“/ sbin / ifconfig eth0：$ key down”;
　　我的$ orig_master_ssh_port = 22;
　　我的$ new_master_ssh_port = 22;
　　exit＆main（）;
　　sub current_time_us {
　　my（$ sec，$ microsec）= gettimeofday（）;
　　我的$ curdate = localtime（$ sec）;
　　返回$ curdate。“”。sprintf（“％06d”，$ microsec）;
　　}
　　sub sleep_until {
　　my $ elapsed = tv_interval（$ _ tstart）;
　　if（$ _running_interval> $ elapsed）{
　　sleep（$ _running_interval - $ elapsed）;
　　}
　　}
　　sub get_threads_util {
　　my $ dbh = shift;
　　我的$ my_connection_id = shift;
　　我的$ running_time_threshold = shift;
　　我的$ type = shift;
　　$ running_time_threshold = 0，除非（$ running_time_threshold）;
　　$ type = 0，除非（$ type）;
　　我的@threads;
　　我的$ sth = $ dbh-> prepare（“SHOW PROCESSLIST”）;
　　$ sth-> execute（）;
　　while（my $ ref = $ sth-> fetchrow_hashref（））{

　　my $>　　我的$ user = $ ref - > {User};
　　我的$ host = $ ref - > {Host};
　　我的$ command = $ ref - > {Command};
　　我的$ state = $ ref - > {State};
　　我的$ query_time = $ ref - > {Time};
　　我的$ info = $ ref - > {Info};
　　$ info =s / ^ \ s *（。*？）\ s * $ / $ 1 / if defined（$ info）;

　　next if（$ my_connection_id == $>　　next if（defined（$ query_time）&& $ query_time= 1）{
　　next if（defined（$ command）&& $ command eq“Sleep”）;
　　next if（defined（$ command）&& $ command eq“Connect”）;
　　}
　　if（$ type> = 2）{
　　next if（defined（$ info）&& $ info =m / ^ select / i）;
　　next if（defined（$ info）&& $ info =m / ^ show / i）;
　　}
　　推送@threads，$ ref;
　　}
　　return @threads;
　　}
　　sub main {
　　if（$ command eq“stop”）{
　　##正当地杀死当前主机上的连接
　　＃1.在新主机
　　＃2 上设置read_only = 1。DROP USER使得没有应用用户可以建立新的连接
　　＃3。在当前主机
　　＃4 上设置read_only = 1。杀死当前查询
　　＃*任何数据库访问失败都会导致脚本死机。
　　我的$ exit_code = 1;
　　eval {
　　##在新主机上设置read_only = 1（以避免意外）
　　我的$ new_master_handler = new MHA :: DBHelper（）;
　　＃args：hostname，port，user，password，raise_error（die_on_error）_or_not
　　$ new_master_handler-> connect（$ new_master_ip，$ new_master_port，
　　$ new_master_user，$ new_master_password，1）;
　　打印current_time_us（）。“设置read_only在新的主人..”;
　　$ new_master_handler-> enable_read_only（）;
　　if（$ new_master_handler-> is_read_only（））{
　　print“ok。\ n”;
　　}
　　else {
　　die“Failed！\ n”;
　　}
　　$ new_master_handler-> disconnect（）;
　　＃连接到原始主机，如果发生任何数据库错误，则会死亡
　　$ orig_master_handler = new MHA :: DBHelper（）;
　　$ orig_master_handler-> connect（$ orig_master_ip，$ orig_master_port，
　　$ orig_master_user，$ orig_master_password，1）;
　　##删除应用程序用户，以便没有人可以连接。事先禁用每会话binlog
　　$ orig_master_handler-> disable_log_bin_local（）;
　　打印current_time_us（）。“在原始主机上吸引应用用户.. \ n”;
　　#FIXME_xxx_drop_app_user（$ orig_master_handler）;
　　等待N * 100毫秒，以便当前的连接可以退出
　　我的$ time_until_read_only = 15;
　　$ _tstart = ;
　　我的@threads = get_threads_util（$ orig_master_handler - > {dbh}，
　　$ orig_master_handler - > {connection_id}）;
　　while（$ time_until_read_only> 0 && $＃threads> = 0）{
　　if（$ time_until_read_only％5 == 0）{
　　printf
　　“％s等待所有正在运行的％d线程断开连接..（最大％d毫秒）\ n”
　　current_time_us（），$＃threads + 1，$ time_until_read_only * 100;
　　if（$＃threadsnew（[$ _]） - > Indent（0） - > Terse（1） - > Dump。
　　“\ n” foreach（@threads）;
　　}
　　}
　　sleep_until（）;
　　$ _tstart = ;
　　$ time_until_read_only--;
　　@threads = get_threads_util（$ orig_master_handler - > {dbh}，
　　$ orig_master_handler - > {connection_id}）;
　　}}
　　##在当前主设备上设置read_only = 1，以便没有人（SUPER除外）可以写入
　　print_time_us（）。“在原始主机上设置read_only = 1”。
　　$ orig_master_handler-> enable_read_only（）;
　　if（$ orig_master_handler-> is_read_only（））{
　　print“ok。\ n”;
　　}
　　else {
　　die“Failed！\ n”;
　　}}
　　等待M * 100毫秒，以便当前的更新查询可以完成
　　我的$ time_until_kill_threads = 5;
　　@threads = get_threads_util（$ orig_master_handler - > {dbh}，
　　$ orig_master_handler - > {connection_id}）;
　　while（$ time_until_kill_threads> 0 && $＃threads> = 0）{
　　if（$ time_until_kill_threads％5 == 0）{
　　printf
　　“％s等待所有运行的％d查询断开连接..（最大％d毫秒）\ n”
　　current_time_us（），$＃threads + 1，$ time_until_kill_threads * 100;
　　if（$＃threadsnew（[$ _]） - > Indent（0） - > Terse（1） - > Dump。“\ n”
　　foreach（@threads）;
　　}
　　}
　　sleep_until（）;
　　$ _tstart = ;
　　$ time_until_kill_threads--;
　　@threads = get_threads_util（$ orig_master_handler - > {dbh}，
　　$ orig_master_handler - > {connection_id}）;
　　}}
　　## Terminating all threads
　　print current_time_us() . " Killing all application threads..\n";
　　$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
　　print current_time_us() . " done.\n";
　　$orig_master_handler->enable_log_bin_local();
　　$orig_master_handler->disconnect();
　　## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
　　eval {
　　`ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
　　};
　　if ($@) {
　　warn $@;
　　}
　　$exit_code = 0;
　　};
　　if ($@) {
　　warn "Got Error: $@\n";
　　exit $exit_code;
　　}
　　exit $exit_code;
　　}
　　elsif ( $command eq "start" ) {
　　## Activating master ip on the new master
　　# 1. Create app user with write privileges
　　# 2. Moving backup script if needed
　　# 3. Register new master's ip to the catalog database
　　# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
　　# If exit code is 0 or 10, MHA does not abort
　　my $exit_code = 10;
　　eval {
　　my $new_master_handler = new MHA::DBHelper();
　　# args: hostname, port, user, password, raise_error_or_not
　　$new_master_handler->connect( $new_master_ip, $new_master_port,
　　$new_master_user, $new_master_password, 1 );
　　## Set read_only=0 on the new master
　　$new_master_handler->disable_log_bin_local();
　　print current_time_us() . " Set read_only=0 on the new master.\n";
　　$new_master_handler->disable_read_only();
　　## Creating an app user on the new master
　　print current_time_us() . " Creating app user on the new master..\n";
　　#FIXME_xxx_create_app_user($new_master_handler);
　　$new_master_handler->enable_log_bin_local();
　　$new_master_handler->disconnect();
　　## Update master ip on the catalog database, etc
　　`ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
　　$exit_code = 0;
　　};
　　if ($@) {
　　warn "Got Error: $@\n";
　　exit $exit_code;
　　}
　　exit $exit_code;
　　}
　　elsif ( $command eq "status" ) {
　　# do nothing
　　exit 0;
　　}
　　else {
　　&usage();
　　exit 1;
　　}
　　}
　　sub usage {
　　print
　　"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
　　die;
　　}
　　(3) 编辑脚本send_report，修改后如下:
　　#!/usr/bin/perl
　　use strict;
　　use warnings FATAL => 'all';
　　use Mail::Sender;
　　use Getopt::Long;
　　#new_master_host and new_slave_hosts are set only when recovering master succeeded
　　my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
　　my $smtp='smtp.163.com';
　　my $mail_from='xxxx';
　　my $mail_user='xxxxx';
　　my $mail_pass='xxxxx';
　　my $mail_to=['xxxx','xxxx'];
　　GetOptions(
　　'orig_master_host=s' => \$dead_master_host,
　　'new_master_host=s'=> \$new_master_host,
　　'new_slave_hosts=s'=> \$new_slave_hosts,
　　'subject=s'       => \$subject,
　　'body=s'          => \$body,
　　);
　　mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
　　sub mailToContacts {
　　my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
　　open my $DEBUG, "> /tmp/monitormail.log"
　　or die "Can't open the debug    file:$!\n";
　　my $sender = new Mail::Sender {
　　ctype    => 'text/plain; ',
　　encoding => 'utf-8',
　　smtp    => $smtp,
　　from    => $mail_from,
　　auth    => 'LOGIN',
　　TLS_allowed => '0',
　　authid    => $user,
　　authpwd => $passwd,
　　to       => $mail_to,
　　subject => $subject,
　　debug    => $DEBUG
　　};
　　$sender->MailMsg(
　　{ msg => $msg,
　　debug => $DEBUG
　　}
　　) or print $Mail::Sender::Error;
　　return 1;
　　}
　　# Do whatever you want here
　　exit 0;
　　五、MHA的日常管理
　　1. 检查SSH配置(node5 Monitor 监控节点上操作)，如下：
　　# masterha_check_ssh --conf=/etc/masterha/app1.cnf
　　Sun May1 22:05:12 2016 - Global configuration file /etc/masterha_default.cnf not found. Skipping.
　　Sun May1 22:05:12 2016 - Reading application default configuration from /etc/masterha/app1.cnf..
　　Sun May1 22:05:12 2016 - Reading server configuration from /etc/masterha/app1.cnf..
　　Sun May1 22:05:12 2016 - Starting SSH connection tests..
　　Sun May1 22:05:14 2016 -
　　Sun May1 22:05:12 2016 - Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.122(192.168.1.122:22)..
　　Sun May1 22:05:13 2016 - ok.
　　Sun May1 22:05:13 2016 - Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.123(192.168.1.123:22)..
　　Sun May1 22:05:13 2016 - ok.
　　Sun May1 22:05:14 2016 -
　　Sun May1 22:05:13 2016 - Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.121(192.168.1.121:22)..
　　Sun May1 22:05:13 2016 - ok.
　　Sun May1 22:05:13 2016 - Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.123(192.168.1.123:22)..
　　Sun May1 22:05:14 2016 - ok.
　　Sun May1 22:05:14 2016 -
　　Sun May1 22:05:13 2016 - Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.121(192.168.1.121:22)..
　　Sun May1 22:05:14 2016 - ok.
　　Sun May1 22:05:14 2016 - Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.122(192.168.1.122:22)..
　　Sun May1 22:05:14 2016 - ok.
　　Sun May1 22:05:14 2016 - All SSH connection tests passed successfully.
　　2. 检查整个复制环境状况(node5 监控节点上操作)，如下：
　　# masterha_check_repl --conf=/etc/masterha/app1.cnf
　　Sun May1 22:46:44 2016 - Global configuration file /etc/masterha_default.cnf not found. Skipping.
　　Sun May1 22:46:44 2016 - Reading application default configuration from /etc/masterha/app1.cnf..
　　Sun May1 22:46:44 2016 - Reading server configuration from /etc/masterha/app1.cnf..
　　Sun May1 22:46:44 2016 - MHA::MasterMonitor version 0.56.
　　Sun May1 22:46:45 2016 - GTID failover mode = 0
　　Sun May1 22:46:45 2016 - Dead Servers:
　　Sun May1 22:46:45 2016 - Alive Servers:
　　Sun May1 22:46:45 2016 - 192.168.1.121(192.168.1.121:3306)
　　Sun May1 22:46:45 2016 - 192.168.1.122(192.168.1.122:3306)
　　Sun May1 22:46:45 2016 - 192.168.1.123(192.168.1.123:3306)
　　Sun May1 22:46:45 2016 - Alive Slaves:
　　Sun May1 22:46:45 2016 - 192.168.1.122(192.168.1.122:3306)Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
　　Sun May1 22:46:45 2016 - Replicating from 192.168.1.121(192.168.1.121:3306)
　　Sun May1 22:46:45 2016 - Primary candidate for the new Master (candidate_master is set)
　　Sun May1 22:46:45 2016 - 192.168.1.123(192.168.1.123:3306)Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
　　Sun May1 22:46:45 2016 - Replicating from 192.168.1.121(192.168.1.121:3306)
　　Sun May1 22:46:45 2016 - Current Alive Master: 192.168.1.121(192.168.1.121:3306)
　　Sun May1 22:46:45 2016 - Checking slave configurations..
　　Sun May1 22:46:45 2016 - read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).

　　Sun May1 22:46:45 2016 - >　　Sun May1 22:46:45 2016 - read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).

　　Sun May1 22:46:45 2016 - >　　Sun May1 22:46:45 2016 - Checking replication filtering settings..
　　Sun May1 22:46:45 2016 - binlog_do_db= , binlog_ignore_db=
　　Sun May1 22:46:45 2016 - Replication filtering check ok.
　　Sun May1 22:46:45 2016 - GTID (with auto-pos) is not supported
　　Sun May1 22:46:45 2016 - Starting SSH connection tests..
　　Sun May1 22:46:46 2016 - All SSH connection tests passed successfully.
　　Sun May1 22:46:46 2016 - Checking MHA Node version..
　　Sun May1 22:46:47 2016 - Version check ok.
　　Sun May1 22:46:47 2016 - Checking SSH publickey authentication settings on the current master..
　　Sun May1 22:46:47 2016 - HealthCheck: SSH to 192.168.1.121 is reachable.
　　Sun May1 22:46:47 2016 - Master MHA Node version is 0.56.
　　Sun May1 22:46:47 2016 - Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..
　　Sun May1 22:46:47 2016 - Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008
　　Sun May1 22:46:47 2016 - Connecting to root@192.168.1.121(192.168.1.121:22)..
　　Creating /tmp if not exists.. ok.
　　Checking output directory is accessible or not..
　　ok.
　　Binlog found at /usr/local/mysql/data/, up to master-bin.000008
　　Sun May1 22:46:48 2016 - Binlog setting check done.
　　Sun May1 22:46:48 2016 - Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
　　Sun May1 22:46:48 2016 - Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.122 --slave_ip=192.168.1.122 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info--relay_dir=/usr/local/mysql/data/--slave_pass=xxx
　　Sun May1 22:46:48 2016 - Connecting to root@192.168.1.122(192.168.1.122:22)..
　　Checking slave recovery environment settings..
　　Opening /usr/local/mysql/data/relay-log.info ... ok.

　　>
　　Temporary>　　Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
　　done.
　　Testing mysqlbinlog output.. done.
　　Cleaning up test file(s).. done.
　　Sun May1 22:46:48 2016 - Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.123 --slave_ip=192.168.1.123 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info--relay_dir=/usr/local/mysql/data/--slave_pass=xxx
　　Sun May1 22:46:48 2016 - Connecting to root@192.168.1.123(192.168.1.123:22)..
　　Checking slave recovery environment settings..
　　Opening /usr/local/mysql/data/relay-log.info ... ok.

　　>
　　Temporary>　　Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
　　done.
　　Testing mysqlbinlog output.. done.
　　Cleaning up test file(s).. done.
　　Sun May1 22:46:48 2016 - Slaves settings check done.
　　Sun May1 22:46:48 2016 -
　　192.168.1.121(192.168.1.121:3306) (current master)
　　+--192.168.1.122(192.168.1.122:3306)
　　+--192.168.1.123(192.168.1.123:3306)
　　Sun May1 22:46:48 2016 - Checking replication health on 192.168.1.122..
　　Sun May1 22:46:48 2016 - ok.
　　Sun May1 22:46:48 2016 - Checking replication health on 192.168.1.123..
　　Sun May1 22:46:48 2016 - ok.
　　Sun May1 22:46:48 2016 - Checking master_ip_failover_script status:
　　Sun May1 22:46:48 2016 - /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.121 --orig_master_ip=192.168.1.121 --orig_master_port=3306
　　IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
　　Checking the Status of the script.. OK
　　Sun May1 22:46:48 2016 - OK.
　　Sun May1 22:46:48 2016 - shutdown_script is not defined.
　　Sun May1 22:46:48 2016 - Got exit code 0 (Not master dead).
　　MySQL Replication Health is OK.
　　#
　　3. 开启MHA Manager监控(node5操作)如下：
　　# mkdir -p/var/log/masterha/app1/
　　# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
　　参数说明：
　　--remove_dead_master_conf #该参数代表当发生主从切换后，老的主库的ip将会从配置文件中移除。
　　--manger_log                #日志存放位置
　　--ignore_last_failover    #在缺省情况下，如果MHA检测到连续发生宕机，会生成app1.failover.complete文件，会造成MHA管理进程无法启动。
　　4. 查看MHA Manager监控是否正常：
　　# masterha_check_status --conf=/etc/masterha/app1.cnf
　　app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121
　　5. 查看启动日志(node5操作)如下：
　　# tail -n20 /var/log/masterha/app1/manager.log
　　6. 关闭MHA Manage监控：
　　(1) 关闭
　　# masterha_stop --conf=/etc/masterha/app1.cnf
　　(2) 启动
　　# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
　　六、MHA Failover切换
　　1. 自动Failover切换
　　(1) 模拟master mysql关闭
　　(2) VIP将会切换到node2
　　(3) /etc/masterha/app1.cnf中将原主服务器配置文件清掉。
　　(4) masterha_manager监控进程会自动退出关闭，并在/var/log/masterha/app1下生成app1.failover.complete文件，manager.log会记录全过程，从服务器会自动从新的主服务器复制。
　　(5) 原主服务器mysqld启动的，需要清掉/var/log/masterha/app1下生成app1.failover.complete文件，添加node1配置文件到/etc/masterha/app1.cnf，通过manager.log中的记录的故障点，重新同步主服务器，成为从节点。
　　2. 手动Failover切换
　　(1) 先停MHA Manager进程。
　　masterha_stop --conf=/etc/masterha/app1.cnf
　　(2) 停掉master mysqld
　　(3) 手动切换，在Manager主机上操作如下：
　　# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.122 --dead_master_port=3306 --new_master_host=192.168.1.121 --new_master_port=3306 --ignore_last_failover
　　通过观察日志可以观察切换全过程。
　　(4) 如上节方式恢复节点为从服务器。
　　3.正常运行情况下切换(Master正在运行)
　　等补充。
　　4. 小结
　　通过对MMM，MHA的环境搭建测试，MHA由于采用复制架构，原理简单，在一些对数据要求比较高的环境，为了保证可靠性，最好与半同步结合使用。

页: [1]

运维网's Archiver

MySQL MHA高可用环境部署