๐๏ธ AlmaLinux Database Replication & Clustering: High Availability Guide
Welcome to the world of bulletproof database systems! ๐ Whether youโre ensuring zero downtime, scaling for millions of users, or protecting critical data, this comprehensive guide will transform you into a database high availability expert who can build systems that never sleep! ๐ช
Database downtime can cost thousands per minute โ but with proper replication and clustering, your databases will survive anything from hardware failures to entire datacenter outages! Letโs build unbreakable database infrastructure! ๐ก๏ธ
๐ค Why is Database Replication & Clustering Important?
Imagine your database staying online even if half your servers explode โ thatโs the power of clustering! ๐ฅ Hereโs why mastering database HA on AlmaLinux is absolutely essential:
- ๐ Zero Downtime - Keep services running 24/7/365
- ๐พ Data Protection - Multiple copies prevent data loss
- โก Load Balancing - Distribute reads across multiple servers
- ๐ Geographic Distribution - Serve users from nearby locations
- ๐ Horizontal Scaling - Add nodes to handle more traffic
- ๐ Automatic Failover - Instant recovery from failures
- ๐ฐ Business Continuity - Avoid revenue loss from outages
- ๐ก๏ธ Disaster Recovery - Survive datacenter failures
๐ฏ What You Need
Letโs prepare your environment for database high availability! โ
Hardware Requirements (Per Node):
- โ Minimum 3 AlmaLinux servers for true HA
- โ 4GB+ RAM per database node
- โ 50GB+ fast storage (SSD recommended)
- โ Gigabit network between nodes
- โ Static IP addresses for all nodes
Software Weโll Configure:
- โ MariaDB with Galera Cluster
- โ MySQL master-slave replication
- โ PostgreSQL streaming replication
- โ HAProxy for load balancing
- โ Keepalived for virtual IP
๐ Setting Up MariaDB Galera Cluster
Letโs build a multi-master database cluster thatโs virtually indestructible! ๐ง
Installing MariaDB and Galera
# On all nodes - Install MariaDB with Galera
sudo dnf install -y mariadb-server mariadb galera rsync
# Configure MariaDB for Galera (Node 1)
sudo tee /etc/my.cnf.d/galera.cnf << 'EOF'
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_on=ON
# Cluster configuration
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_name="almalinux_cluster"
wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12"
wsrep_node_address="192.168.1.10"
wsrep_node_name="galera1"
# SST method
wsrep_sst_method=rsync
# Tuning
wsrep_slave_threads=4
innodb_flush_log_at_trx_commit=0
EOF
# Configure firewall on all nodes
sudo firewall-cmd --permanent --add-service=mysql
sudo firewall-cmd --permanent --add-port=4567/tcp # Galera replication
sudo firewall-cmd --permanent --add-port=4568/tcp # IST port
sudo firewall-cmd --permanent --add-port=4444/tcp # SST port
sudo firewall-cmd --reload
# Bootstrap the cluster (Node 1 only)
sudo galera_new_cluster
# Start MariaDB on other nodes
# On Node 2 and 3:
sudo systemctl start mariadb
# Secure MariaDB installation
sudo mysql_secure_installation
Configuring Galera Nodes
# On Node 2 - Configure galera.cnf
sudo tee /etc/my.cnf.d/galera.cnf << 'EOF'
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_name="almalinux_cluster"
wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12"
wsrep_node_address="192.168.1.11"
wsrep_node_name="galera2"
wsrep_sst_method=rsync
wsrep_slave_threads=4
innodb_flush_log_at_trx_commit=0
EOF
# On Node 3 - Configure galera.cnf
sudo tee /etc/my.cnf.d/galera.cnf << 'EOF'
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_name="almalinux_cluster"
wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12"
wsrep_node_address="192.168.1.12"
wsrep_node_name="galera3"
wsrep_sst_method=rsync
wsrep_slave_threads=4
innodb_flush_log_at_trx_commit=0
EOF
# Verify cluster status
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%';"
# Check cluster size (should be 3)
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Testing Galera Replication
# Create test database on any node
mysql -u root -p << 'EOF'
CREATE DATABASE test_replication;
USE test_replication;
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO users (username, email) VALUES
('alice', '[email protected]'),
('bob', '[email protected]');
EOF
# Verify on other nodes
mysql -u root -p -e "SELECT * FROM test_replication.users;"
# Create monitoring user
mysql -u root -p << 'EOF'
CREATE USER 'monitor'@'%' IDENTIFIED BY 'MonitorPass123';
GRANT USAGE, REPLICATION CLIENT ON *.* TO 'monitor'@'%';
FLUSH PRIVILEGES;
EOF
๐ง MySQL Master-Slave Replication
Letโs set up traditional MySQL replication for read scaling! ๐
Configuring Master Server
# On Master server
sudo tee /etc/my.cnf.d/master.cnf << 'EOF'
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin
binlog_format = ROW
binlog_do_db = production_db
max_binlog_size = 100M
expire_logs_days = 7
EOF
# Create replication user
mysql -u root -p << 'EOF'
CREATE USER 'replicator'@'%' IDENTIFIED BY 'ReplicaPass123';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;
EOF
# Get master status
mysql -u root -p -e "SHOW MASTER STATUS;"
# Note the File and Position values!
# Backup database for initial sync
mysqldump -u root -p --all-databases --master-data > master_dump.sql
scp master_dump.sql slave-server:/tmp/
Configuring Slave Server
# On Slave server
sudo tee /etc/my.cnf.d/slave.cnf << 'EOF'
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin
log_bin = /var/log/mysql/mysql-bin
binlog_format = ROW
read_only = 1
EOF
# Restore master dump
mysql -u root -p < /tmp/master_dump.sql
# Configure slave replication
mysql -u root -p << 'EOF'
STOP SLAVE;
RESET SLAVE;
CHANGE MASTER TO
MASTER_HOST='192.168.1.10',
MASTER_USER='replicator',
MASTER_PASSWORD='ReplicaPass123',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=154;
START SLAVE;
EOF
# Check slave status
mysql -u root -p -e "SHOW SLAVE STATUS\G"
# Verify replication is working
# IO_Running: Yes
# SQL_Running: Yes
๐ PostgreSQL Streaming Replication
Letโs set up PostgreSQL with hot standby for high availability! ๐
Configuring Primary PostgreSQL Server
# Install PostgreSQL on all nodes
sudo dnf install -y postgresql postgresql-server postgresql-contrib
# Initialize database (primary only)
sudo postgresql-setup --initdb
# Configure primary server
sudo tee -a /var/lib/pgsql/data/postgresql.conf << 'EOF'
# Replication settings
wal_level = replica
max_wal_senders = 3
wal_keep_segments = 64
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/archive/%f'
listen_addresses = '*'
EOF
# Create archive directory
sudo -u postgres mkdir -p /var/lib/pgsql/archive
# Configure authentication
sudo tee -a /var/lib/pgsql/data/pg_hba.conf << 'EOF'
# Replication connections
host replication replicator 192.168.1.0/24 md5
host all all 192.168.1.0/24 md5
EOF
# Start PostgreSQL
sudo systemctl enable --now postgresql
# Create replication user
sudo -u postgres psql << 'EOF'
CREATE USER replicator WITH REPLICATION LOGIN PASSWORD 'ReplicaPass123';
EOF
# Create base backup for standby
sudo -u postgres pg_basebackup -h localhost -D /tmp/standby -U replicator -W -v -P
Configuring Standby PostgreSQL Server
# On standby server
# Stop PostgreSQL if running
sudo systemctl stop postgresql
# Copy base backup
sudo rm -rf /var/lib/pgsql/data/*
sudo cp -R /tmp/standby/* /var/lib/pgsql/data/
sudo chown -R postgres:postgres /var/lib/pgsql/data
# Create standby signal file
sudo -u postgres touch /var/lib/pgsql/data/standby.signal
# Configure recovery settings
sudo -u postgres tee /var/lib/pgsql/data/postgresql.auto.conf << 'EOF'
primary_conninfo = 'host=192.168.1.10 port=5432 user=replicator password=ReplicaPass123'
restore_command = 'cp /var/lib/pgsql/archive/%f %p'
EOF
# Start standby server
sudo systemctl start postgresql
# Verify replication status on primary
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"
# Check standby status
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
โ Load Balancing with HAProxy
Letโs distribute database connections across multiple nodes! โ๏ธ
Installing and Configuring HAProxy
# Install HAProxy
sudo dnf install -y haproxy
# Configure HAProxy for database load balancing
sudo tee /etc/haproxy/haproxy.cfg << 'EOF'
global
log 127.0.0.1 local0
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
defaults
mode tcp
log global
option tcplog
option dontlognull
option redispatch
retries 3
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
maxconn 3000
# Statistics page
stats enable
stats uri /stats
stats realm HAProxy\ Statistics
stats auth admin:admin123
# MySQL Load Balancing
listen mysql-cluster
bind *:3306
mode tcp
option mysql-check user monitor
balance roundrobin
server galera1 192.168.1.10:3306 check
server galera2 192.168.1.11:3306 check
server galera3 192.168.1.12:3306 check
# PostgreSQL Load Balancing
listen postgresql-cluster
bind *:5432
mode tcp
option pgsql-check user replicator
balance roundrobin
server pg-primary 192.168.1.20:5432 check
server pg-standby1 192.168.1.21:5432 check backup
server pg-standby2 192.168.1.22:5432 check backup
EOF
# Start HAProxy
sudo systemctl enable --now haproxy
# Check HAProxy statistics
# Browse to: http://your-server-ip/stats
Virtual IP with Keepalived
# Install Keepalived on HAProxy servers
sudo dnf install -y keepalived
# Configure Keepalived (Master)
sudo tee /etc/keepalived/keepalived.conf << 'EOF'
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass SecretPass
}
virtual_ipaddress {
192.168.1.100/24
}
}
EOF
# Configure Keepalived (Backup)
sudo tee /etc/keepalived/keepalived.conf << 'EOF'
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass SecretPass
}
virtual_ipaddress {
192.168.1.100/24
}
}
EOF
# Start Keepalived
sudo systemctl enable --now keepalived
# Verify virtual IP
ip addr show | grep 192.168.1.100
๐ฎ Quick Examples
Example 1: Automatic Failover Testing
# Create failover test script
cat > /usr/local/bin/test-failover.sh << 'EOF'
#!/bin/bash
echo "Testing Database Failover..."
# Test Galera failover
echo "Stopping one Galera node..."
ssh galera1 "sudo systemctl stop mariadb"
sleep 5
echo "Testing cluster availability..."
mysql -h 192.168.1.100 -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
echo "Restarting stopped node..."
ssh galera1 "sudo systemctl start mariadb"
sleep 10
echo "Verifying cluster recovery..."
mysql -h 192.168.1.100 -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
echo "Failover test complete!"
EOF
chmod +x /usr/local/bin/test-failover.sh
Example 2: Database Performance Monitoring
# Create monitoring script
cat > /usr/local/bin/monitor-databases.sh << 'EOF'
#!/bin/bash
echo "=== Database Cluster Status ==="
echo "Timestamp: $(date)"
echo ""
echo "=== Galera Cluster Status ==="
mysql -h 192.168.1.10 -u monitor -pMonitorPass123 -e "
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_cluster_size',
'wsrep_cluster_status',
'wsrep_connected',
'wsrep_ready',
'wsrep_local_state_comment'
);"
echo -e "\n=== PostgreSQL Replication Status ==="
sudo -u postgres psql -h 192.168.1.20 -c "
SELECT
client_addr,
state,
sync_state,
replay_lag
FROM pg_stat_replication;"
echo -e "\n=== HAProxy Statistics ==="
echo "show stat" | sudo socat stdio /var/run/haproxy/admin.sock | \
cut -d',' -f1,2,18,19 | head -10
echo -e "\n=== Connection Statistics ==="
mysql -h 192.168.1.100 -u monitor -pMonitorPass123 -e "
SHOW STATUS WHERE Variable_name IN (
'Threads_connected',
'Connections',
'Aborted_connects',
'Max_used_connections'
);"
EOF
chmod +x /usr/local/bin/monitor-databases.sh
# Create cron job for monitoring
echo "*/5 * * * * /usr/local/bin/monitor-databases.sh >> /var/log/db-monitor.log" | sudo crontab -
Example 3: Backup Strategy for Clustered Databases
# Create backup script for Galera cluster
cat > /usr/local/bin/backup-galera.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/backup/mysql/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
# Backup from least loaded node
LEAST_LOADED=$(mysql -h 192.168.1.100 -u monitor -pMonitorPass123 -e "
SELECT VARIABLE_VALUE
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_local_recv_queue_avg'
ORDER BY VARIABLE_VALUE ASC
LIMIT 1;" -s)
echo "Backing up from node with lowest queue..."
# Perform backup using mariabackup
mariabackup --backup \
--target-dir="$BACKUP_DIR" \
--user=root \
--password=YourRootPassword \
--host=192.168.1.10
# Compress backup
tar czf "$BACKUP_DIR.tar.gz" -C "$BACKUP_DIR" .
rm -rf "$BACKUP_DIR"
echo "Backup completed: $BACKUP_DIR.tar.gz"
# Rotate old backups (keep last 7 days)
find /backup/mysql -name "*.tar.gz" -mtime +7 -delete
EOF
chmod +x /usr/local/bin/backup-galera.sh
# Schedule daily backups
echo "0 2 * * * /usr/local/bin/backup-galera.sh" | sudo crontab -
๐จ Fix Common Clustering Problems
Letโs solve the most frequent database clustering issues! ๐ ๏ธ
Problem 1: Split Brain in Galera Cluster
Symptoms: Nodes canโt agree on cluster state Solution:
# Check cluster status on all nodes
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status';"
# If split-brain detected, bootstrap from most advanced node
# Find most advanced node
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_last_committed';"
# Stop all nodes
sudo systemctl stop mariadb
# Bootstrap from most advanced node
sudo galera_new_cluster
# Start other nodes normally
sudo systemctl start mariadb
Problem 2: Replication Lag
Symptoms: Slave servers falling behind master Solution:
# Check replication lag
mysql -u root -p -e "SHOW SLAVE STATUS\G" | grep Seconds_Behind_Master
# Optimize slave performance
mysql -u root -p << 'EOF'
SET GLOBAL slave_parallel_threads = 4;
SET GLOBAL slave_parallel_mode = 'optimistic';
SET GLOBAL slave_domain_parallel_threads = 2;
EOF
# Skip problematic transaction if needed
mysql -u root -p << 'EOF'
STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;
EOF
Problem 3: Connection Refused Through HAProxy
Symptoms: Canโt connect to database through load balancer Solution:
# Check HAProxy status
sudo systemctl status haproxy
# Verify backend server status
echo "show servers state" | sudo socat stdio /var/run/haproxy/admin.sock
# Check firewall rules
sudo firewall-cmd --list-all
# Test direct connection to backends
mysql -h 192.168.1.10 -u monitor -pMonitorPass123 -e "SELECT 1;"
# Restart HAProxy
sudo systemctl restart haproxy
Problem 4: Node Wonโt Join Cluster
Symptoms: New node canโt join existing cluster Solution:
# Check network connectivity
ping -c 3 other-cluster-nodes
# Verify cluster address configuration
grep wsrep_cluster_address /etc/my.cnf.d/galera.cnf
# Check for port blocks
nc -zv 192.168.1.10 4567
# Force SST (State Snapshot Transfer)
sudo systemctl stop mariadb
sudo rm -rf /var/lib/mysql/*
sudo mysql_install_db --user=mysql
sudo systemctl start mariadb
๐ Database Clustering Commands Summary
Essential clustering commands at your fingertips! โก
Command | Purpose |
---|---|
galera_new_cluster | Bootstrap Galera cluster |
SHOW STATUS LIKE 'wsrep%' | Check Galera status |
SHOW SLAVE STATUS\G | Check MySQL replication |
pg_stat_replication | PostgreSQL replication status |
mariabackup --backup | Backup Galera node |
CHANGE MASTER TO | Configure MySQL slave |
pg_basebackup | PostgreSQL base backup |
show stat (HAProxy) | Load balancer statistics |
๐ก Best Practices for Database HA
Master these tips for bulletproof database clusters! ๐ฏ
- ๐ข Odd Number of Nodes - Use 3, 5, or 7 nodes to avoid split-brain
- ๐ Network Quality - Low latency is crucial for synchronous replication
- ๐ Monitor Everything - Track replication lag, cluster status, performance
- ๐พ Regular Backups - Even with replication, backups are essential
- ๐ง Test Failover - Regularly test automatic failover procedures
- ๐ Document Procedures - Create runbooks for common issues
- ๐ Tune Performance - Optimize for your specific workload
- ๐ก๏ธ Security First - Encrypt replication traffic
- โ๏ธ Load Balance Reads - Distribute read queries across replicas
- ๐ Automate Recovery - Script common recovery procedures
๐ What Youโve Accomplished
Congratulations on mastering database high availability! ๐ Youโve achieved:
- โ Galera Cluster deployment with multi-master replication
- โ MySQL master-slave replication configured
- โ PostgreSQL streaming replication established
- โ HAProxy load balancing for connection distribution
- โ Keepalived virtual IP for automatic failover
- โ Monitoring and alerting systems implemented
- โ Backup strategies for clustered databases
- โ Performance optimization techniques applied
- โ Troubleshooting skills for common issues
- โ Disaster recovery procedures documented
๐ฏ Why These Skills Matter
Your database HA expertise ensures business continuity! ๐ With these skills, you can:
Immediate Benefits:
- ๐ Achieve 99.99% uptime for critical databases
- โก Scale read performance by 10x or more
- ๐ก๏ธ Survive hardware failures without data loss
- ๐ฐ Save thousands per hour of prevented downtime
Long-term Value:
- ๐ Become the database reliability expert
- ๐ผ Design enterprise-grade database architectures
- ๐ Build globally distributed database systems
- ๐ Enable business growth without limits
Youโre now equipped to build database systems that never go down, scale infinitely, and recover automatically from disasters! Your databases are now as reliable as the sunrise! ๐
Keep clustering, keep scaling! ๐