MHA MySQL Quick Start Guide

Stella981
• 阅读 970

MHA MySQL Quick Start Guide

02 Sep 2016

By Miguel Angel Nieto Insight for DBAs, MySQL MHA, MHA QuickStart Guide, MySQL 6 Comments

MHA (Master High Availability Manager and tools for MySQL) is one of the most important pieces of our managed services. When properly set up, it can check replication health, move writer and reader virtual IPs, perform failovers, and have its output constantly monitored by Nagios. Is it easy to deploy and follows the KISS (Keep It Simple, Stupid) philosophy that I love so much.

This blog post is a quick start guide to try it out and play with it in your own testing environment. I assume that you already know how to install software, deal with SSH keys and setup replication in MySQL. The post just covers MHA configuration.

Testing environment

Taken from /etc/hosts

Shell

192.168.1.116 mysql-server1 192.168.1.117 mysql-server2 192.168.1.118 mysql-server3 192.168.1.119 mha-manager

1

2

3

4

192.168.1.116 mysql - server1

192.168.1.117 mysql - server2

192.168.1.118 mysql - server3

192.168.1.119 mha - manager

mysql-server1: Our master MySQL server with 5.6
mysql-server2: Slave server
mysql-server3: Slave server
mha-manager: The server monitors the replication and from where we manage MHA. The installation is also required to meet some Perl dependencies.

We just introduced some new concepts, the MHA Node and MHA Manager:

MHA Node

It is installed and runs on each MySQL server. This is the piece of software that it is invoked by the manager every time we want to do something, like for example a failover or a check.

MHA Manager

As explained before, this is our operations center. The manager monitors the services, replication, and includes several administrative command lines.

Pre-requisites

  • Replication must already be running. MHA manages replication and monitors it, but it is not a tool to deploy it. So MySQL and replication need to be running already.
  • All hosts should be able to connect to each other using public SSH keys.
  • All nodes need to be able to connect to each other’s MySQL servers.
  • All nodes should have the same replication user and password.
  • In the case of multi-master setups, only one writable node is allowed. All others need to be configured with read_only.
  • MySQL version has to be 5.0 or later.
  • Candidates for master failover should have binary log enabled. The replication user must exist there too.
  • Binary log filtering variables should be the same on all servers (replicate-wild%, binlog-do-db…).
  • Disable automatic relay-log purge and do it regularly from a cron task. You can use an MHA-included script called “purge_relay_logs”.

While that is a large list of requisites, I think that they are pretty standard and logical.

MHA installation

As explained before, the MHA Node needs to be installed on all the nodes. You can download it from this Google Drive link.

This post shows you how to install it using the source code, but there are RPM packages available. Deb too, but only for older versions. Use the installation method you prefer. This is how to compile it:

Shell

tar -xzf mha4mysql-node-0.57.tar.gz perl Makefile.PL make make install

1

2

3

4

tar - xzf mha4mysql - node - 0.57.tar.gz

perl Makefile .PL

make

make install

The commands included in the node package are save_binary_logs, filter_mysqlbinlog, purge_relay_logs, apply_diff_relay_logs. Mostly tools that the manager needs to call in order to perform a failover, while trying to minimize or avoid any data loss.

On the manager server, you need to install MHA Node plus MHA Manager. This is due to MHA Manager dependance on a Perl library that comes with MHA Node. The installation process is just the same.

Configuration

We only need one configuration file on the Manager node. The example below is a good starting point:

Shell

# cat /etc/app1.cnf [server default] # mysql user and password user=root password=supersecure ssh_user=root # working directory on the manager manager_workdir=/var/log/masterha/app1 # working directory on MySQL servers remote_workdir=/var/log/masterha/app1 [server1] hostname=mysql-server1 candidate_master=1 [server2] hostname=mysql-server2 candidate_master=1 [server3] hostname=mysql-server3 no_master=1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

# cat /etc/app1.cnf

[ server default ]

# mysql user and password

user = root

password = supersecure

ssh_user = root

# working directory on the manager

manager_workdir = / var / log / masterha / app1

# working directory on MySQL servers

remote_workdir = / var / log / masterha / app1

[ server1 ]

hostname = mysql - server1

candidate_master = 1

[ server2 ]

hostname = mysql - server2

candidate_master = 1

[ server3 ]

hostname = mysql - server3

no_master = 1

So pretty straightforward. It specifies that there are three servers, two that can be master and one that can’t be promoted to master.

Let’s check if we meet some of the pre-requisites. We are going to test if replication is working, can be monitored, and also if SSH connectivity works.

Shell

# masterha_check_ssh --conf=/etc/app1.cnf [...] [info] All SSH connection tests passed successfully.

1

2

3

# masterha_check_ssh --conf=/etc/app1.cnf

[ . . . ]

[ info ] All SSH connection tests passed successfully .

It works. Now let’s check MySQL:

Shell

# masterha_check_repl --conf=/etc/app1.cnf [...] MySQL Replication Health is OK.

1

2

3

# masterha_check_repl --conf=/etc/app1.cnf

[ . . . ]

MySQL Replication Health is OK .

Start the manager and operations

Everything is setup, we meet the pre-requisites. We can start our manager:

Shell

# masterha_manager --remove_dead_master_conf --conf=/etc/app1.cnf [...] [info] Starting ping health check on mysql-server1(192.168.1.116:3306).. [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

1

2

3

4

# masterha_manager --remove_dead_master_conf --conf=/etc/app1.cnf

[ . . . ]

[ info ] Starting ping health check on mysql - server1 ( 192.168.1.116 : 3306 ) . .

[ info ] Ping ( SELECT ) succeeded , waiting until MySQL doesn' t respond . .

The manager found our master and it is now actively monitoring it using a SELECT command. –remove_dead_master_conf tells the manager that if the master goes down, it must edit the config file and remove the master’s configuration from it after a successful failover. This avoids the “there is a dead slave” error when you restart the manager. All servers listed in the conf should be part of the replication and in good health, or the manager will refuse to work.

Automatic and manual failover

Good, everything is running as expected. What happens if the MySQL master dies!?!

Shell

[...] [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin Creating /var/log/masterha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/log/mysql, up to mysql-bin.000002 [info] HealthCheck: SSH to mha-server1 is reachable. [...]

1

2

3

4

5

6

7

8

9

[ . . . ]

[ warning ] Got error on MySQL select ping : 2006 ( MySQL server has gone away )

[ info ] Executing SSH check script : save_binary_logs -- command = test -- start_pos = 4 -- binlog_dir = / var / lib / mysql , / var / log / mysql -- output_file = / var / log / masterha / app1 / save_binary_logs_test -- manager_version = 0.57 -- binlog_prefix = mysql - bin

Creating / var / log / masterha / app1 if not exists . . ok .

Checking output directory is accessible or not . .

ok .

Binlog found at / var / log / mysql , up to mysql - bin . 000002

[ info ] HealthCheck : SSH to mha - server1 is reachable .

[ . . . ]

First, it tries to connect by SSH to read the binary log and save it. MHA can apply the missing binary log events to the remaining slaves so they are up to date with all the before-failover info. Nice!

Theses different phases follow:

Shell

* Phase 1: Configuration Check Phase.. * Phase 2: Dead Master Shutdown Phase.. * Phase 3: Master Recovery Phase.. * Phase 3.1: Getting Latest Slaves Phase.. * Phase 3.2: Saving Dead Master's Binlog Phase.. * Phase 3.3: Determining New Master Phase.. [info] Finding the latest slave that has all relay logs for recovering other slaves.. [info] All slaves received relay logs to the same position. No need to resync each other. [info] Starting master failover.. [info] From: mysql-server1(192.168.1.116:3306) (current master) +--mysql-server2(192.168.1.117:3306) +--mysql-server3(192.168.1.118:3306) To: mysql-server2(192.168.1.117:3306) (new master) +--mysql-server3(192.168.1.118:3306) * Phase 3.3: New Master Diff Log Generation Phase.. * Phase 3.4: Master Log Apply Phase.. * Phase 4: Slaves Recovery Phase.. * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. * Phase 4.2: Starting Parallel Slave Log Apply Phase.. * Phase 5: New master cleanup phase..

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

* Phase 1 : Configuration Check Phase . .

* Phase 2 : Dead Master Shutdown Phase . .

* Phase 3 : Master Recovery Phase . .

* Phase 3.1 : Getting Latest Slaves Phase . .

* Phase 3.2 : Saving Dead Master' s Binlog Phase . .

* Phase 3.3 : Determining New Master Phase . .

[ info ] Finding the latest slave that has all relay logs for recovering other slaves . .

[ info ] All slaves received relay logs to the same position . No need to resync each other .

[ info ] Starting master failover . .

[ info ]

From :

mysql - server1 ( 192.168.1.116 : 3306 ) ( current master )

  • -- mysql - server2 ( 192.168.1.117 : 3306 )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

To :

mysql - server2 ( 192.168.1.117 : 3306 ) ( new master )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

* Phase 3.3 : New Master Diff Log Generation Phase . .

* Phase 3.4 : Master Log Apply Phase . .

* Phase 4 : Slaves Recovery Phase . .

* Phase 4.1 : Starting Parallel Slave Diff Log Generation Phase . .

* Phase 4.2 : Starting Parallel Slave Log Apply Phase . .

* Phase 5 : New master cleanup phase . .

The phases are pretty self-explanatory. MHA tries to get all the data possible from the master’s binary log and slave’s relay log (the one that is more advanced) to avoid losing any data or promote a slave that it was far behind the master. So it tries to promote a slave with the most current data as possible. We see that server2 has been promoted to master, because in our configuration we specified that server3 shouldn’t be promoted.

After the failover, the manager service stops itself. If we check the config file, the failed server is not there anymore. Now the recovery is up to you. You need to get the old master back in the replication chain, then add it again to the config file and start the manager.

It is also possible to perform a manual failover (if, for example, you need to do some maintenance on the master server). To do that you need to:

  • Stop masterha_manager.
  • Run masterha_master_switch –master_state=alive –conf=/etc/app1.cnf. The line says that you want to switch the master, but the actual master is still alive, so no need to mark it as dead or remove it from the conf file.

And that’s it. Here is part of the output. It shows the tool making the decision on the new topology and asking the user for confirmation:

Shell

[info] From: mysql-server1(192.168.1.116:3306) (current master) +--mysql-server2(192.168.1.117:3306) +--mysql-server3(192.168.1.118:3306) To: mysql-server2(192.168.1.117:3306) (new master) +--mysql-server3(192.168.1.118:3306) Starting master switch from mha-server1(192.168.1.116:3306) to mha-server2(192.168.1.117:3306)? (yes/NO): yes [...] [info] Switching master to mha-server2(192.168.1.117:3306) completed successfully.

1

2

3

4

5

6

7

8

9

10

11

12

13

[ info ]

From :

mysql - server1 ( 192.168.1.116 : 3306 ) ( current master )

  • -- mysql - server2 ( 192.168.1.117 : 3306 )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

To :

mysql - server2 ( 192.168.1.117 : 3306 ) ( new master )

  • -- mysql - server3 ( 192.168.1.118 : 3306 )

Starting master switch from mha - server1 ( 192.168.1.116 : 3306 ) to mha - server2 ( 192.168.1.117 : 3306 ) ? ( yes / NO ) : yes

[ . . . ]

[ info ] Switching master to mha - server2 ( 192.168.1.117 : 3306 ) completed successfully .

You can also employ some extra parameters that are really useful in some cases:

–orig_master_is_new_slave: if you want to make the old master a slave of the new one.

–running_updates_limit: if the current master executes write queries that take more than this parameter’s setting, or if any of the MySQL slaves behind master take more than this parameter, the master switch aborts. By default, it’s 1 (1 second). All these checks are for safety reasons.

–interactive=0: if you want to skip all the confirmation requests and questions masterha_master_switch could ask.

Check this link in case you use GTID and want to avoid problems with errant transactions during the failover:

https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/

Custom scripts

Since this is a quick guide to start playing around with MHA, I won’t cover advanced topics in detail. But I will mention a few:

    • Custom scripts. MHA can move IPs around, shutdown a server and send you a report in case something happens. It needs a custom script, however. MHA comes with some example scripts, but you would need to write one that fits your environment.The directives are master_ip_failover_script, shutdown_script, report_script. With them configured, MHA will send you an email or a message to your mobile device in the case of a failover, shutdown the server and move IPs between servers. Pretty nice!

Hope you found this quickstart guide useful for your own tests. Remember, one of the most important things: don’t overdo automation!  😉 These tools are good for checking health and performing the first initial failover. But you must still investigate what happened, why, fix it and work to avoid it from happening again. In high availability (HA) environments, automate everything and cause it to stop being HA.

Have fun!

Consul, ProxySQL and MySQL HASeptember 16, 2016In "MySQL"

Tips from the trenches for over-extended MySQL DBAsDecember 2, 2014In "Insight for DBAs"

ProxySQL and MHA IntegrationSeptember 13, 2016In "Insight for DBAs"

Get the Inside Scoop from Percona

Join 33,000+ fellow open-source enthusiasts! Our newsletter provides monthly updates on Percona open source software releases, technical resources, and valuable MySQL, MariaDB, PostgreSQL and MongoDB-related posts from the blog. Get information on Percona Live, our technical webinars, and upcoming events and meetups where you can talk with our experts.

Enter your email address: *

By submitting my information I agree that Percona may use my personal data in send communication to me about Percona services. I understand that I can unsubscribe from the communication at any time in accordance with the Percona Privacy Policy.

Author

Miguel Angel Nieto

Miguel joined Percona in October 2011. He has worked as a System Administrator for a Free Software consultant and in the supporting area of the biggest hosting company in Spain. His current focus is improving MySQL and helping the community of Free Software to grow.


Share this post

Facebook Twitter LinkedIn Email

Comments (6)

Simon J Mudd Reply

Hi Miguel Angel,

A nice write-up on MHA.

I looked at MHA some time ago and found it to work quite well. Setting up the required ssh and MySQL access was more of a nuisance especially if the number of servers grows, but the main thing that put a brake on MHA was the fact it could not fix broken intermediate masters. That is if the topology is only a master with directly connected slaves everything is ok. If you have a master connected to an intermediate master which has a slave MHA can not handle this, or that’s my recollection. Perhaps it has changed now. If you have servers spread over multiple datacentres then it is much more likely that intermediate masters may exist, and that’s also true if you migrate from one version of MySQL to another (major one) and can not take downtime. In that situation you’re likely to have a new intermediate master of the higher version running under the older master and this new master will have slaves of the newer version.

This lead me to look at Orchestrator which is able to resolve the issues mentioned and also only requires MySQL access to the servers to do its work. Percona have had a couple of blogs on Orchestrator so that’s covered elsewhere but I think this “limitation” I saw in MHA if it is still present should be mentioned.

September 3, 2016 at 5:21 am

    • Doug Wiggin Reply

      Simon, Thanks for the post. We are going to give Orchestrator a run in one of our environments seeking to replace MMM Monitor.

      September 5, 2016 at 4:37 pm

  • Miguel Angel Nieto Reply

    Thanks for the comment Simon.

    Multi-tier environments are not really supported:

    https://code.google.com/p/mysql-master-ha/wiki/Requirements#For_managing_three_or_more_tier_replication_environment..

    But master-master environments are. So, as usual in free software, you have to use the tool that helps you better 🙂

    September 6, 2016 at 4:45 am

  • Vinay Reply

    Hi Miguel Angel,

    Please help me to understand the limitation of MHA .

    February 14, 2017 at 8:27 am

  • Dylan Reply

    I configured it the way you want it, but I got an error when checking the replication:
    https://pastebin.ubuntu.com/p/8X3cDtFZ24/
    Where is it wrong? TAT

    January 16, 2019 at 9:19 am

  • Indubhushan Jha Reply

    Hello,
    Can we install MHA manager on all nodes? I know it is recommended to install it on SLAVE host only. But what is the side effect if we install it on all node (including Master node). This can help us to initiate the switchover from any node.

    January 30, 2019 at 4:52 am

Leave a Reply


How Can We Help?

Percona's experts can maximize your application performance with our open source database support, managed services or consulting.

Subscribe

Want to get weekly updates listing the latest blog posts? Subscribe now and we'll send you an update every Friday at 1pm ET.

Polls

How has the traffic and load on your database or application been trending over the last few weeks?

  • Significantly decreased
  • Decreased
  • No significant change
  • Increased
  • Significantly Increased

View Results

Categories

Upcoming Webinars

All Webinars »

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
3个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Wesley13 Wesley13
3年前
Java获得今日零时零分零秒的时间(Date型)
publicDatezeroTime()throwsParseException{    DatetimenewDate();    SimpleDateFormatsimpnewSimpleDateFormat("yyyyMMdd00:00:00");    SimpleDateFormatsimp2newS
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
3年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Stella981 Stella981
3年前
Django中Admin中的一些参数配置
设置在列表中显示的字段,id为django模型默认的主键list_display('id','name','sex','profession','email','qq','phone','status','create_time')设置在列表可编辑字段list_editable
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
为什么mysql不推荐使用雪花ID作为主键
作者:毛辰飞背景在mysql中设计表的时候,mysql官方推荐不要使用uuid或者不连续不重复的雪花id(long形且唯一),而是推荐连续自增的主键id,官方的推荐是auto_increment,那么为什么不建议采用uuid,使用uuid究
Python进阶者 Python进阶者
9个月前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这