How to Configure AWS Aurora with an External MySQL Instance for Replication
MySQL replication is an important part of keeping database backups up-to-date. You might need multiple slaves and wish to use your self-hosted MySQL instance for replication, or you might want to transfer to a different cloud or datacenter. In this article, we’ll look at how we can achieve it.
Overview
- Enable bin log in master
- create user for replication in master.
- create snapshot and restore.
- Dump schema from restored instance.
- Dump data from restored instance.
- Dump users from restored instance.
- Import Schema to external MySQL Instance.
- Import Users to external MySQL Instance.
- Import Data to external MySQL Instance.
- Start Replication
Steps
In assuming you already have a AWS Aurora instance up and running and also you already have you self hosted MySQL. (you can host MySQL on a EC2 Instance or on your bare metal machine.)
- The first step is to enable binary logs in Aurora. Go to the Cluster-level parameter group and make sure
binlog_format
is set toROW
. change requires a restart of the writer instance. Run this query on master
SHOW MASTER LOGS;
it will show you log name and log size.
Otherwise, you will get an error:
ERROR 1381 (HY000): You are not using binary logging
We also need to ensure a proper binary log retention period. For example, if we expect the initial data export/import to take one day, we can set the retention period to something like three days to be on the safe side. This will help ensure we can roll forward the restored data. use below command to check current retention period
CALL mysql.rds_show_configuration;
use below command to update retention period to 3 days.
call mysql.rds_set_configuration(‘binlog retention hours’, 72);
2. create a dedicated user for replication.
CREATE USER 'replica'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'replica'@'%';
FLUSH PRIVILEGES;
3. Create a snapshot and restore it. The snapshot is the only way to get a consistent backup with the specific binary log position. As Aurora does not allow “super” privileges and we cannot take READ LOCK.
Next, capture the binary log position for replication. This is done by inspecting the Logs & events section in the console. you you’ll see something like this.
Binlog position from crash recovery is mysql-bin-changelog.014824 107977934
take a note of bin file name and position we’ll need this later.
4. create a configuration file with the login details
tee /data/aurora.cnf <<EOF[client]user=usernamepassword=passwordhost=your-restored-database.ap-south-1.rds.amazonaws.comEOF
I’m using mydumper to dump schema.
mydumper --no-data \
--triggers \
--routines \
--events \
-v 3 \
--no-locks \
--outputdir /data/schema \
--logfile /data/mydumper-schema.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
--defaults-file /data/aurora.cnf
Above command will take schema dump of all the database excluding mysql,test,performance_schema,information_schema,sys.
5. Run below command to take dump of data.
mydumper --threads 2 \
--no-schemas
--compress \
--triggers \
--routines \
--events \
--rows="10000000" \
--verbose 3 \
--long-query-guard 999999 \
--no-locks \
--outputdir /data/export \
--logfile /data/mydumper-data.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
-O skip.txt \
--defaults-file /data/aurora.cnf
you can increase number of threads as per your instance number of CPU.
6. To take dump of users i’m using percona-toolkit.
sudo apt install percona-toolkitpt-show-grants --ask-pass --host your-restored-database.ap-south-1.rds.amazonaws.com --user admin > users.sql
7. Now it’s time to import schema. Before starting the import we’ll update some configuration to speed up the import and after import we’ll revert those configuration updates.
updating the configuration is an optional step.
vi /etc/mysql/mysql.conf.d/mysqld.cnf
log_slave_updates
innodb_buffer_pool_size=10G #you can adjust this according to your instance memory size, for 16GB instance I'm using 10GB.innodb_log_file_size=1G
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=0
sync_binlog=0
master_info_repository=TABLE
relay_log_info_repository=TABLE
innodb_flush_neighbors=0
innodb_io_capacity_max=10000
innodb_stats_on_metadata=off
max_allowed_packet=1G
net_read_timeout=60
performance_schema=off
innodb_adaptive_hash_index=off
expire_logs_days=3
innodb_doublewrite=off
save and exit and restart your MySQL server.
systemctl restart mysql
Now, use below command to import the schema.
myloader \
--directory /data/schema \
--verbose 3 \
--host localhost \
--user username \
--ask-password
8. Let’s import users.
mysql -p < users.sql
9. Before importing the data, I would recommend to run this inside a screen session. As this is going to take time depending on the size of your database.
myloader --threads 2 \
--directory /data/export \
--queries-per-transaction 100 \
--verbose 3 \
--logfile /data/myloader-data.log \
--host localhost \
--user root \
--ask-password
you can increase number of threads as per your instance number of CPU.
10. Finally we can start the replication. To start the replication log in to your self hosted MySQL instance and run below command to add master information.
CHANGE MASTER TO
MASTER_HOST='you-aws-aurora-instance.ap-south-1.rds.amazonaws.com',
MASTER_USER='replica',
MASTER_PASSWORD='password',
MASTER_LOG_FILE = 'mysql-bin-changelog.014824-change-this',
MASTER_LOG_POS = 107977934-change-this;
To start slave run below command.
start slave;
you can check the status of slave by running below command.
show slave status \G
Bonus
To make sure replication is happening you can compare Master_Log_File and Read_Master_Log_Pos or you can also check Seconds_Behind_Master and if their is an error you’ll see some values in Last_Errno and Last_Error.
To check master current Log File Name and Log Position run below query on master.
show master status \G
Final Words
I attempted to make the preceding examples as straightforward as possible.
Replication is not a simple activity, it entails a number of complex processes, and there’s no guarantee that you won’t run into problems when carrying out the stages outlined above.
You can post your problems in the comments section, and I’ll do my best to respond.
Large datasets take time to process, patience is required because there is no quick way out.