MySQL Master Master Setup

Building a Redundant Multi Master MySQL Infrastructure

Architectural redundancy represents the bedrock of modern data persistence in high-availability environments. For critical systems such as regional energy grids, municipal water utility monitoring, or global cloud service providers, a single point of failure within the database layer is unacceptable. The MySQL Master Master Setup, often referred to as multi-source or circular replication, addresses this by allowing two distinct database nodes to act as both a provider and a consumer of data simultaneously. This bidirectional flow ensures that if Node A fails, Node B remains fully operational with a mirrored data set, ready to accept writes without a manual promotion process. However, this configuration introduces complex challenges regarding data consistency and conflict resolution. When implemented correctly, it minimizes latency for geographically distributed users and provides a robust failover mechanism that maintains high throughput even during localized infrastructure outages. The following manual outlines the rigorous engineering standards required to deploy and audit such a system.

Technical Specifications

| Requirement | Default Port/Range | Protocol/Standard | Impact Level | Recommended Resources |
| :— | :— | :— | :— | :— |
| MySQL Server 8.0+ | 3306 | TCP/IP (SQL) | 10 | 4 vCPU / 16GB RAM |
| Replication Traffic | 3306 | Binary Log Stream | 9 | Dedicated 1Gbps NIC |
| OS: Ubuntu 22.04 | N/A | POSIX / Linux | 8 | 500GB NVMe (RAID 1) |
| Firewall (UFW) | 3306, 22 | State-based Filter | 7 | Hardware Firewall |
| Binary Log Format | N/A | ROW-based | 10 | High-speed IOPS |

Configuration Protocol

Environment Prerequisites:

Successful deployment requires two identical server instances with synchronized system clocks via NTP to prevent timestamp drift. The systems must run MySQL 8.0 or higher to leverage advanced crash-safe replication features. Users must possess sudo privileges and access to the mysql root account. Ensure that the network path between nodes exhibits minimal signal-attenuation and zero packet-loss to prevent replication lag. All configuration changes must be documented in the infrastructure audit log to maintain an idempotent record of the system state.

Section A: Implementation Logic:

The engineering logic of a MySQL Master Master Setup relies on the offset of auto-increment values and the unique identification of server instances. By configuring Node 1 to generate odd primary keys and Node 2 to generate even primary keys, we prevent primary key collisions during concurrent writes. The binary log (binlog) functions as a transaction ledger; it records minden change as an encapsulation of the SQL payload. These logs are then transmitted to the secondary node’s relay log, where they are executed to achieve state consistency. This design reduces the thermal-inertia of the recovery process by keeping both nodes “hot” and ready for immediate traffic redirection.

Step-By-Step Execution

1. Global Installation and Dependency Check

Execute the command sudo apt-get update && sudo apt-get install mysql-server -y on both nodes. After the process completes, verify the service status using systemctl status mysql.
System Note: This action initializes the mysqld daemon and populates the /var/lib/mysql directory with the base system tables; it triggers the kernel to allocate initial memory buffers for the InnoDB storage engine.

2. Networking and Interface Binding

Navigate to the directory /etc/mysql/mysql.conf.d/ and open the file mysqld.cnf. Locate the bind-address variable and modify it from 127.0.0.1 to 0.0.0.0 or the specific private IP address of the node.
System Note: Modifying this parameter alters the network socket binding of the service: it allows the process to listen for ingress packets on the external network interface rather than just the local loopback.

3. Server Identification and Binary Logging

In the same mysqld.cnf file, uncomment and set the following variables for Node 1:
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
auto_increment_increment = 2
auto_increment_offset = 1
For Node 2, set server-id = 2, auto_increment_increment = 2, and auto_increment_offset = 2.
System Note: The server-id provides a unique identifier within the replication topology; unique offsets are critical to prevent write-concurrency conflicts that would otherwise lead to duplicate entry errors in the storage layer.

4. Service Restart and State Initialization

Apply the configuration changes by executing sudo systemctl restart mysql. Validate that the service is running correctly by checking the error log at /var/log/mysql/error.log.
System Note: A restart forces the mysqld process to release its current memory-mapped files and re-read the configuration parameters from the disk; this is a non-idempotent action that briefly interrupts availability.

5. Replication User Creation

Log into the MySQL shell using sudo mysql and create a dedicated replication user. Enter the command: CREATE USER ‘replica_user’@’%’ IDENTIFIED WITH mysql_native_password BY ‘secure_password_here’; followed by GRANT REPLICATION SLAVE ON . TO ‘replica_user’@’%’;.
System Note: This step populates the mysql.user and mysql.db tables: it defines the authorization levels required for the remote node to pull the binary log payload across the network.

6. Synchronizing the Binary Log Coordinates

On Node 1, execute SHOW MASTER STATUS; and record the File and Position values. Repeat this on Node 2. These values represent the exact offset in the binary log where the other node should begin reading data.
System Note: The binlog position acts as a pointer in the sequential transaction file; identifying this coordinate is essential for ensuring that no existing data is skipped during the initial handshake.

7. Establishing Bi-Directional Links

On Node 1, point the replication engine to Node 2:
CHANGE MASTER TO MASTER_HOST=’NODE_2_IP’, MASTER_USER=’replica_user’, MASTER_PASSWORD=’secure_password_here’, MASTER_LOG_FILE=’recorded_file_name_from_node_2′, MASTER_LOG_POS=recorded_position_from_node_2;
Execute START SLAVE; then repeat the process on Node 2 pointing back to Node 1.
System Note: This command initializes the I/O thread and the SQL thread. The I/O thread handles the network payload and write-to-relay-log tasks, while the SQL thread executes the encapsulated commands to update the local data state.

Section B: Dependency Fault-Lines:

The most frequent failure in this architecture is the “Split-Brain” scenario. This occurs when network partitions cause nodes to diverge so significantly that automatic reconciliation becomes impossible. Another bottleneck is high disk I/O overhead. If the binary log is stored on slow mechanical media, the throughput of the entire system will be throttled by the latency of the write-ahead log. Furthermore, if the max_allowed_packet size is too small, large transaction payloads will be truncated, leading to replication breakage and packet-loss errors in the communication stream.

THE TROUBLESHOOTING MATRIX

Section C: Logs & Debugging:

When replication stalls, the primary diagnostic tool is the command SHOW SLAVE STATUS\G. Admins must inspect the Last_IO_Error and Last_SQL_Error strings for specific fault codes.
1. Error 1062 (Duplicate Entry): Indicates that the auto_increment offsets were not configured correctly, resulting in collision.
2. Error 1045 (Access Denied): Suggests an authentication failure between nodes; verify the replica_user credentials and firewall rules on port 3306.
3. Log Path: Always verify the physical sensor of the database via /var/log/mysql/error.log. If the log shows “Disk full” or “Read-only file system,” the underlying hardware or mount point has failed.
Visual cues for network issues include high values in Seconds_Behind_Master, which signifies that the workload is exceeding the network capacity or local CPU concurrency limits.

OPTIMIZATION & HARDENING

Performance Tuning: To maximize throughput, set innodb_flush_log_at_trx_commit = 2. This reduces the frequency of physical disk flushes, though it introduces a slight risk of losing one second of data during a power failure. Adjust the innodb_buffer_pool_size to 70 percent of total system RAM to minimize disk swaps and reduce latency.
Security Hardening: Implement TLS/SSL for the replication stream to protect data in transit from sniffing. Use the REQUIRE SSL clause in the GRANT statement for the replication user. Restrict access to port 3306 using iptables or ufw so that only the IP address of the peer node is permitted.
Scaling Logic: As traffic grows, consider adding a load balancer such as ProxySQL or HAProxy in front of the masters. This allows for intelligent query routing based on type (read vs. write) and provides health checks to automatically divert traffic away from a degraded node. To maintain performance under load, monitor the CPU thermal-inertia and ensure cooling systems are adequate for sustained high-concurrency operations.

THE ADMIN DESK

How do I handle a reboot of one node?
MySQL is configured to start on boot via systemctl. Upon restart, the slave threads will attempt to reconnect automatically. Verify the status with SHOW SLAVE STATUS to ensure the I/O thread has resumed the sync process.

Can I skip a replication error?
If a non-critical error occurs, use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;. Exercise caution as this creates data inconsistency between nodes. It is better to resolve the underlying data conflict manually to remain idempotent.

What happens if both nodes update the same row?
The last write wins based on the execution time of the SQL thread. However, this can lead to data divergence. Use application-level logic to ensure that specific data sets are only written to one master at a time.

How do I monitor replication lag?
Monitor the Seconds_Behind_Master variable. If this value increases steadily, it indicates that the replica cannot keep up with the master’s throughput. Check for long-running queries or network congestion and optimize the underlying hardware resources.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top