How to add a slave to a MySQL replication setup with no downtime

Recently I had to add a new MySQL slave server to an existing master -> slave setup, without causing any downtime. Having done that only a couple of times in the past and never having played the central role in the process, I wanted to make sure I was getting everything right, so I did a lot of research but couldn’t find all the best (or what I consider the best) bits and pieces in one single place, so I am documenting everything here for future reference and hopefully they will also help others.

Saving time

I’ll start by mentioning something that saved me a ton of time: don’t use mysqldump to copy the data to the new slave.

Lots of articles around the web mention that you should create a dump of your existing slave using mysqldump, copy it to the new slave and then import it. Although this works, it takes a lot of time if your dataset is big. Think about it: your server will be reading all the data from disk, processing it into SQL queries, and then writing those queries back to disk. Then you have to copy the dump to the new slave (possibly over the internet, which makes it even slower) and finally import the dump into the new Mysql slave, which means it will read all the statements from the dump file and write them into the actual database. That’s a lot of reading and writing and even with today’s disks, it’s a slow operation, especially if your server has a lot of data - in my case it was around 250GB and a previous attempt by my colleagues took over 48 hours for the import process alone.

A better way is to just copy the MySQL data directory directly to the new slave and ajust a few things. This is actually the method recommended in the MySQL manual, although it is lacking in details.

I stopped the existing slave before copying the data directory over to the new slave to make sure I was getting all the data but I have also seen it mentioned in an article that you can copy the data with rsync without stopping the slave, then stop the slave and execute the rsync copy command again. In theory this second run should be faster, because rsync just has to synchronize the changes that took place since you copied the files, meaning your existing slave will be offline less time. I haven’t tried it but I think it’s an interesting idea and I’ll give it a try next time.

An important thing to note about this method (copying data directory directly vs. dumping+copying+importing) is that MySQL versions on the existing slave and the new slave should be the same, otherwise you will probably get into trouble. A difference in the revision number (e.g. 5.5.10 and 5.5.22) should not be a problem, but I would check before using different minor versions (e.g. 5.5 and 5.4). Major version differences (e.g. 5.x and 4.x) are definitely a no go.

Jacob Holmes noted in a comment that in newer MySQL versions (the manual says it’s from 5.6 onward) there’s a auto.cnf file that holds the server_uuid. It should be in /var/lib/mysql, which means it would be moved to the new slave along with the rest of the files, causing the master to throw an error due to seeing two slaves with the same uuid.

The solution is to not copy the file or delete it in the new slave, since MySQL will recreate it as needed.

The process

This process was done on CentOS 6.4 (not my personal favorite but I wasn’t the one who installed the OS) but you could easily apply it on other distributions as well. If you’re looking into this matter, you probably know what to do, like replace yum install with apt-get install and whatnot.

MySQL was Percona’s MySQL 5.5 but this should apply to the regular MySQL 5.5.

Also, for long running commands, I suggest you use screen or byobu, so you can detatch your session, close your SSH connection, shutdown your computer, go drink a cup of tea or get some sleep. Just don’t start the rsync copy command (for example) on your regular SSH shell, because after a couple of hours in you may realise you need to stop it and all that time will be wasted. I won’t go into the details of how to use screen or byobu here, there’s enough documentation about that everywhere.

Also, when you see a $ in front of a command here, it means it’s a command to be executed in the shell, whereas a mysql> means it’s a command to be executed inside mysql.

Right, let’s get down to business.

This first step is optional, but it can potentially save you time when copying the data over to the new slave. If you have a big dataset, though, the time it will take to optimize your database(s) will probably negate the quicker transfer time. On your existing slave:

$ screen
$ time mysqlcheck -u <username> -p --optimize <database_name>

Stop the new slave, so we don’t mess up the data we’re going to copy:

$ service mysql stop

On the existing slave, add the following line to /etc/my.cnf, under the [mysqld] section:

sync_binlog=1

Then restart MySQL:

$ service mysql restart

This is not required and may hurt the performance of the server, so you may want to skip this step. Read about it and decide for yourself.

On the existing slave, check the replication status:

mysql> show slave status\G;

Make sure the line “Seconds_Behind_Master:” shows “0” (zero). If it doesn’t, wait until it does (assuming there’s no error).

Stop replication on existing slave:

mysql> stop slave;
mysql> show slave status\G;

The lines “Slave_IO_Running:” and “Slave_SQL_Running:” should both say “No”.

Copy /etc/my.cnf from the existing slave to the new slave. In the existing slave, execute:

$ scp /etc/my.cnf <username>@<new_slave_hostname>:/etc/my.cnf

Copy the mysql data directory from existing slave to new slave. This is the big one, so make sure you execute it in screen. In the existing slave, execute:

$ screen
$ cd /var/lib/mysql
$ rsync -Sa --progress ./* <username>@<new_slave_hostname>:/var/lib/mysql/

This step is only required for MySQL 5.6 and above. In the new slave, you need to delete the file /var/lib/mysql/auto.cnf, so that the new slave’s UUID doesn’t match the one in the old slave (MySQL will generate a new one if the file is not present).
While the data is being copied, go to the master, create a user for the new slave and grant it replication permissions:

mysql> CREATE USER 'new_slave_username'@'new_slave_hostname' IDENTIFIED BY 'new_password';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'new_slave_username'@'new_slave_hostname' IDENTIFIED BY 'new_password';

You may also need to execute FLUSH PRIVILEGES on the master to make sure the replication user has the required privileges. After the data directory has been copied from the existing slave to the new slave, it is then safe to restart replication on the existing slave:

mysql> start slave;
mysql> show slave status\G;

The lines “Slave_IO_Running:” and “Slave_SQL_Running:” should both say “Yes”.

All the following commands are executed on the new slave, after the data directory has been copied. Edit /etc/my.cnf and increment the number on the line “server-id”, making sure no other slave is using the same identifier.
In /var/lib/mysql, rename the files master.info to master.info.oldslave and relay-log.info to relay-log.info.oldslave. These may be in a different location in your setup, so I’ll assume you know how to find them.
Edit /etc/my.cnf and add skip-slave-start (if it doesn’t exist) under the [mysqld] section.
Start the new slave. Replication won’t start immediately because of the line we added in the previous step, but MySQL will be running and will allow us to change some settings in a safe way:

$ service mysql start

Now we need to get some data from the old slave files. Do this:

$ cat relay-log.info.oldslave

You’ll see something like this:

./db2-relay-bin.000861
2244366
master-bin.000285
447348971
1

Take note of the first line, which is a file name for the binlog that the old slave was processing when you stopped it. That file should exist in /var/lib/mysql in the new slave. Also take note of the second line, which is the position in the binlog the old slave was at.

To update the new slave log settings, run this:

$ mysqlbinlog --start-position=2244366 ./db2-relay-bin.000861 | mysql -u <user> -p

Don’t forget to replace the file name and the start position with the values you got on the previous step.

Now we need to get the master information on the new slave. Run this:

$ cat master.info.oldslave

You’ll see something like this (with a few blank lines that I deleted for brevity):

18
master-bin.000285
447348971
10.38.56.7
replprd
socialrepl
3306
60
0
0
1800.000
0

Take note of the second line, which is the file name for the master binlog the old slave was processing when you stopped it. Also take note of the third line, which is the position in the binlog the old slave was at.

To update the new slave master settings, go into mysql and run this:

mysql> CHANGE MASTER TO MASTER_HOST='<master_hostname>', MASTER_USER='new_slave_username', MASTER_PASSWORD='<new_password>', MASTER_LOG_FILE='master-bin.000285', MASTER_LOG_POS=447348971;

Don’t forget to replace the log file name and the start position with the values you got on the previous step.

Still in MySQL, start the slave again:

mysql> start slave;

Check that there are no errors and that the slave is working:

mysql> show slave status\G;

The lines “Slave_IO_Running:” and “Slave_SQL_Running:” should both say “Yes”. The line “Seconds_Behind_Master:” may have a non 0 value but that’s fine. It will go down to zero as the new slave catches up with the master.”

You can now edit /etc/my.cnf and remove or comment out the skip-slave-start, so that the slave will start automatically when MySQL is restarted.

And you’re done! Now you just have to wait for new slave to catch up with the master and you will have a brand new MySQL slave on your setup!

Comments, suggestions, questions?

I am by no means an expert on MySQL and replication is a complex topic, so any suggestions you may have to improve on this are very welcome. Any questions you may have about this process are also welcome and I’ll try to help out as much as I can.

Menu

Saving time

The process

Comments, suggestions, questions?