How to Configure AWS Aurora with an External MySQL Instance for Replication

Sujit Patel
4 min readJun 18, 2022

--

MySQL replication is an important part of keeping database backups up-to-date. You might need multiple slaves and wish to use your self-hosted MySQL instance for replication, or you might want to transfer to a different cloud or datacenter. In this article, we’ll look at how we can achieve it.

AWS Aurora Replication with an External MySQL

Overview

  1. Enable bin log in master
  2. create user for replication in master.
  3. create snapshot and restore.
  4. Dump schema from restored instance.
  5. Dump data from restored instance.
  6. Dump users from restored instance.
  7. Import Schema to external MySQL Instance.
  8. Import Users to external MySQL Instance.
  9. Import Data to external MySQL Instance.
  10. Start Replication

Steps

In assuming you already have a AWS Aurora instance up and running and also you already have you self hosted MySQL. (you can host MySQL on a EC2 Instance or on your bare metal machine.)

  1. The first step is to enable binary logs in Aurora. Go to the Cluster-level parameter group and make sure binlog_format is set to ROW. change requires a restart of the writer instance. Run this query on master

SHOW MASTER LOGS;

it will show you log name and log size.

Otherwise, you will get an error:

ERROR 1381 (HY000): You are not using binary logging

We also need to ensure a proper binary log retention period. For example, if we expect the initial data export/import to take one day, we can set the retention period to something like three days to be on the safe side. This will help ensure we can roll forward the restored data. use below command to check current retention period

CALL mysql.rds_show_configuration;

use below command to update retention period to 3 days.

call mysql.rds_set_configuration(‘binlog retention hours’, 72);

2. create a dedicated user for replication.

CREATE USER 'replica'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'replica'@'%';
FLUSH PRIVILEGES;

3. Create a snapshot and restore it. The snapshot is the only way to get a consistent backup with the specific binary log position. As Aurora does not allow “super” privileges and we cannot take READ LOCK.

Next, capture the binary log position for replication. This is done by inspecting the Logs & events section in the console. you you’ll see something like this.

Binlog position from crash recovery is mysql-bin-changelog.014824 107977934

take a note of bin file name and position we’ll need this later.

4. create a configuration file with the login details

tee /data/aurora.cnf <<EOF[client]user=usernamepassword=passwordhost=your-restored-database.ap-south-1.rds.amazonaws.comEOF

I’m using mydumper to dump schema.

mydumper --no-data \
--triggers \
--routines \
--events \
-v 3 \
--no-locks \
--outputdir /data/schema \
--logfile /data/mydumper-schema.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
--defaults-file /data/aurora.cnf

Above command will take schema dump of all the database excluding mysql,test,performance_schema,information_schema,sys.

5. Run below command to take dump of data.

mydumper --threads 2 \
--no-schemas
--compress \
--triggers \
--routines \
--events \
--rows="10000000" \
--verbose 3 \
--long-query-guard 999999 \
--no-locks \
--outputdir /data/export \
--logfile /data/mydumper-data.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
-O skip.txt \
--defaults-file /data/aurora.cnf

you can increase number of threads as per your instance number of CPU.

6. To take dump of users i’m using percona-toolkit.

sudo apt install percona-toolkitpt-show-grants --ask-pass --host your-restored-database.ap-south-1.rds.amazonaws.com --user admin > users.sql

7. Now it’s time to import schema. Before starting the import we’ll update some configuration to speed up the import and after import we’ll revert those configuration updates.

updating the configuration is an optional step.

vi /etc/mysql/mysql.conf.d/mysqld.cnf

log_slave_updates
innodb_buffer_pool_size=10G #you can adjust this according to your instance memory size, for 16GB instance I'm using 10GB.
innodb_log_file_size=1G
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=0
sync_binlog=0
master_info_repository=TABLE
relay_log_info_repository=TABLE
innodb_flush_neighbors=0
innodb_io_capacity_max=10000
innodb_stats_on_metadata=off
max_allowed_packet=1G
net_read_timeout=60
performance_schema=off
innodb_adaptive_hash_index=off
expire_logs_days=3
innodb_doublewrite=off

save and exit and restart your MySQL server.

systemctl restart mysql

Now, use below command to import the schema.

myloader \
--directory /data/schema \
--verbose 3 \
--host localhost \
--user username \
--ask-password

8. Let’s import users.

mysql -p < users.sql

9. Before importing the data, I would recommend to run this inside a screen session. As this is going to take time depending on the size of your database.

myloader --threads 2 \
--directory /data/export \
--queries-per-transaction 100 \
--verbose 3 \
--logfile /data/myloader-data.log \
--host localhost \
--user root \
--ask-password

you can increase number of threads as per your instance number of CPU.

10. Finally we can start the replication. To start the replication log in to your self hosted MySQL instance and run below command to add master information.

CHANGE MASTER TO
MASTER_HOST='you-aws-aurora-instance.ap-south-1.rds.amazonaws.com',
MASTER_USER='replica',
MASTER_PASSWORD='password',
MASTER_LOG_FILE = 'mysql-bin-changelog.014824-change-this',
MASTER_LOG_POS = 107977934-change-this;

To start slave run below command.

start slave;

you can check the status of slave by running below command.

show slave status \G

Bonus

To make sure replication is happening you can compare Master_Log_File and Read_Master_Log_Pos or you can also check Seconds_Behind_Master and if their is an error you’ll see some values in Last_Errno and Last_Error.

To check master current Log File Name and Log Position run below query on master.

show master status \G

Final Words

I attempted to make the preceding examples as straightforward as possible.
Replication is not a simple activity, it entails a number of complex processes, and there’s no guarantee that you won’t run into problems when carrying out the stages outlined above.
You can post your problems in the comments section, and I’ll do my best to respond.

Large datasets take time to process, patience is required because there is no quick way out.

--

--

Sujit Patel
Sujit Patel

Written by Sujit Patel

DevOps Engineer, Linux lover, Technology and Automation enthusiast. A strong believer in continuous learning.

No responses yet