🔄 Managing Service Failover: Simple Guide

Let me show you how to set up service failover in Alpine Linux! This ensures your services keep running even when things go wrong. It’s like having a backup plan for your important services!

🤔 What is Service Failover?

Service failover automatically switches to a backup service when the main one fails. Think of it like having a spare tire - when the main tire goes flat, you can quickly switch to the spare and keep driving. In Alpine Linux, we can set this up easily!

Why use failover?

Keep services running 24/7
Minimize downtime
Automatic recovery
Better reliability
Peace of mind

🎯 What You Need

Before starting, you’ll need:

Two Alpine Linux servers
Network connectivity between them
A service to protect (like nginx)
Basic terminal knowledge
About 20 minutes

📋 Step 1: Install Monitoring Tools

First, let’s install what we need:

# On both servers
apk update

# Install keepalived for failover
apk add keepalived

# Install monitoring tools
apk add monit

# Install networking tools
apk add ipvsadm iproute2

# Enable services
rc-update add keepalived
rc-update add monit

📋 Step 2: Configure Primary Server

Set up the main server:

# Create keepalived config
cat > /etc/keepalived/keepalived.conf << 'EOF'
global_defs {
    router_id SERVER1
    script_user root
    enable_script_security
}

vrrp_script check_service {
    script "/usr/local/bin/check_service.sh"
    interval 2
    weight 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass alpine123
    }
    
    virtual_ipaddress {
        192.168.1.100/24
    }
    
    track_script {
        check_service
    }
    
    notify_master "/usr/local/bin/notify_master.sh"
    notify_backup "/usr/local/bin/notify_backup.sh"
    notify_fault "/usr/local/bin/notify_fault.sh"
}
EOF

# Create service check script
cat > /usr/local/bin/check_service.sh << 'EOF'
#!/bin/sh
# Check if nginx is running
if pgrep nginx > /dev/null; then
    exit 0
else
    exit 1
fi
EOF

chmod +x /usr/local/bin/check_service.sh

📋 Step 3: Configure Backup Server

Set up the backup server:

# Create keepalived config (on backup server)
cat > /etc/keepalived/keepalived.conf << 'EOF'
global_defs {
    router_id SERVER2
    script_user root
    enable_script_security
}

vrrp_script check_service {
    script "/usr/local/bin/check_service.sh"
    interval 2
    weight 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 90
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass alpine123
    }
    
    virtual_ipaddress {
        192.168.1.100/24
    }
    
    track_script {
        check_service
    }
    
    notify_master "/usr/local/bin/notify_master.sh"
    notify_backup "/usr/local/bin/notify_backup.sh"
    notify_fault "/usr/local/bin/notify_fault.sh"
}
EOF

# Copy check script from primary
# (Same as primary server)

📋 Step 4: Create Notification Scripts

Set up alerts when failover happens:

# Master notification script
cat > /usr/local/bin/notify_master.sh << 'EOF'
#!/bin/sh
echo "$(date): Became MASTER" >> /var/log/keepalived-state.log
# Start services if needed
rc-service nginx start
EOF

# Backup notification script
cat > /usr/local/bin/notify_backup.sh << 'EOF'
#!/bin/sh
echo "$(date): Became BACKUP" >> /var/log/keepalived-state.log
# Optional: Stop non-critical services
EOF

# Fault notification script
cat > /usr/local/bin/notify_fault.sh << 'EOF'
#!/bin/sh
echo "$(date): FAULT detected" >> /var/log/keepalived-state.log
# Alert admin
logger "Keepalived FAULT state detected!"
EOF

# Make scripts executable
chmod +x /usr/local/bin/notify_*.sh

📋 Step 5: Configure Service Monitoring

Set up Monit to restart failed services:

# Configure Monit
cat > /etc/monitrc << 'EOF'
set daemon 30
set log /var/log/monit.log

set httpd port 2812 and
    use address localhost
    allow localhost
    allow admin:monit

check process nginx with pidfile /run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"
    if failed host 127.0.0.1 port 80 protocol http then restart
    if 3 restarts within 5 cycles then unmonitor

check process keepalived with pidfile /run/keepalived.pid
    start program = "/etc/init.d/keepalived start"
    stop program = "/etc/init.d/keepalived stop"
    if 3 restarts within 5 cycles then unmonitor

check system $HOST
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 2 then alert
    if cpu usage > 95% for 10 cycles then alert
    if memory usage > 75% then alert
EOF

chmod 600 /etc/monitrc

📋 Step 6: Test Basic Failover

Let’s test if failover works:

# Start services on both servers
rc-service keepalived start
rc-service monit start
rc-service nginx start

# Check virtual IP (on primary)
ip addr show | grep 192.168.1.100

# Test failover
# On primary server:
rc-service nginx stop

# Check if IP moved to backup
# On backup server:
ip addr show | grep 192.168.1.100

# Verify in logs
tail -f /var/log/keepalived-state.log

📋 Step 7: Advanced Configuration

Add more sophisticated checks:

# Enhanced service check
cat > /usr/local/bin/check_service_advanced.sh << 'EOF'
#!/bin/sh
# Multiple service checks

# Check nginx
curl -f -s -o /dev/null http://localhost || exit 1

# Check disk space
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
[ $DISK_USAGE -gt 90 ] && exit 1

# Check load average
LOAD=$(cat /proc/loadavg | awk '{print $1}' | cut -d. -f1)
[ $LOAD -gt 10 ] && exit 1

# All checks passed
exit 0
EOF

chmod +x /usr/local/bin/check_service_advanced.sh

# Update keepalived to use advanced check
sed -i 's/check_service.sh/check_service_advanced.sh/g' /etc/keepalived/keepalived.conf

📋 Step 8: Set Up Split-Brain Prevention

Prevent both servers from becoming master:

# Add fencing script
cat > /usr/local/bin/fence_peer.sh << 'EOF'
#!/bin/sh
# Simple fencing to prevent split-brain

PEER_IP="192.168.1.2"  # Other server's IP
VIP="192.168.1.100"

# Check if peer is responding
ping -c 1 -W 1 $PEER_IP > /dev/null 2>&1
PEER_ALIVE=$?

# Check if peer has VIP
ssh -o ConnectTimeout=2 root@$PEER_IP "ip addr show | grep -q $VIP" 2>/dev/null
PEER_HAS_VIP=$?

if [ $PEER_ALIVE -eq 0 ] && [ $PEER_HAS_VIP -eq 0 ]; then
    # Peer is alive and has VIP, we should be backup
    echo "Peer has VIP, staying as backup"
    exit 1
fi

exit 0
EOF

chmod +x /usr/local/bin/fence_peer.sh

🎮 Practice Exercise

Try this failover scenario:

Set up two test services
Configure failover between them
Test different failure modes
Monitor the results

# Practice setup
# Create test service
cat > /usr/local/bin/test_service.sh << 'EOF'
#!/bin/sh
while true; do
    echo "Service running on $(hostname)" > /tmp/service.status
    sleep 1
done
EOF

chmod +x /usr/local/bin/test_service.sh

# Add to monit
echo "
check process test_service matching test_service.sh
    start program = '/usr/local/bin/test_service.sh &'
    stop program = 'pkill -f test_service.sh'
" >> /etc/monitrc

🚨 Troubleshooting Common Issues

Failover Not Working

Check these common problems:

# Verify keepalived is running
ps aux | grep keepalived

# Check configuration syntax
keepalived -t

# Look for errors
tail -f /var/log/messages | grep keepalived

# Test network connectivity
ping -c 3 <other_server_ip>

# Check firewall rules
iptables -L -n | grep vrrp

Split-Brain Situation

Fix when both servers think they’re master:

# Check who has VIP
ip addr show | grep 192.168.1.100

# Force one to backup
rc-service keepalived restart

# Check VRRP communication
tcpdump -i eth0 -n vrrp

# Verify passwords match
grep auth_pass /etc/keepalived/keepalived.conf

Service Won’t Restart

Debug service monitoring:

# Check monit status
monit status

# Test service manually
/etc/init.d/nginx restart

# Check monit logs
tail -f /var/log/monit.log

# Validate monit config
monit -t

💡 Pro Tips

Tip 1: Email Alerts

Set up email notifications:

# Add to notify scripts
cat >> /usr/local/bin/notify_master.sh << 'EOF'
# Send email alert
echo "Server $(hostname) is now MASTER" | \
    mail -s "Failover Alert" [email protected]
EOF

Tip 2: Multiple Virtual IPs

Handle multiple services:

# Add more VIPs in keepalived.conf
virtual_ipaddress {
    192.168.1.100/24
    192.168.1.101/24
    192.168.1.102/24
}

Tip 3: Priority Tuning

Adjust failover sensitivity:

# In keepalived.conf
vrrp_instance VI_1 {
    # Faster failover
    advert_int 1
    
    # Preempt settings
    preempt_delay 30
    
    # Priority adjustment
    priority 100  # Higher = preferred master
}

✅ Monitoring Dashboard

Create a simple status page:

# Status check script
cat > /var/www/localhost/htdocs/status.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
    <title>Failover Status</title>
    <meta http-equiv="refresh" content="5">
</head>
<body>
    <h1>Service Failover Status</h1>
    <pre id="status"></pre>
    <script>
    fetch('/status.txt')
        .then(r => r.text())
        .then(t => document.getElementById('status').textContent = t);
    </script>
</body>
</html>
EOF

# Generate status file
cat > /usr/local/bin/update_status.sh << 'EOF'
#!/bin/sh
{
    echo "Generated: $(date)"
    echo "Hostname: $(hostname)"
    echo "Keepalived: $(rc-status | grep keepalived)"
    echo "Virtual IPs:"
    ip addr show | grep "inet.*scope global"
    echo "Service Status:"
    rc-status
} > /var/www/localhost/htdocs/status.txt
EOF

chmod +x /usr/local/bin/update_status.sh

# Add to cron
echo "* * * * * /usr/local/bin/update_status.sh" | crontab -

🏆 What You Learned

Excellent work! You can now:

✅ Set up keepalived for failover
✅ Configure service monitoring
✅ Handle automatic failover
✅ Prevent split-brain issues
✅ Monitor failover status

Your services are now highly available!

🎯 What’s Next?

Now that you have failover working, explore:

Load balancing with HAProxy
Database replication
Clustered file systems
Advanced monitoring with Prometheus

Keep your services running 24/7! 🔄

🔄 Managing Service Failover: Simple Guide

Table of Contents

🤔 What is Service Failover?

🎯 What You Need

📋 Step 1: Install Monitoring Tools

📋 Step 2: Configure Primary Server

📋 Step 3: Configure Backup Server

📋 Step 4: Create Notification Scripts

📋 Step 5: Configure Service Monitoring

📋 Step 6: Test Basic Failover

📋 Step 7: Advanced Configuration

📋 Step 8: Set Up Split-Brain Prevention

🎮 Practice Exercise

🚨 Troubleshooting Common Issues

Failover Not Working

Split-Brain Situation

Service Won’t Restart

💡 Pro Tips

Tip 1: Email Alerts

Tip 2: Multiple Virtual IPs

Tip 3: Priority Tuning

✅ Monitoring Dashboard

🏆 What You Learned

🎯 What’s Next?

Share this article

🔄 Managing Service Failover: Simple Guide

Table of Contents

🤔 What is Service Failover?

🎯 What You Need

📋 Step 1: Install Monitoring Tools

📋 Step 2: Configure Primary Server

📋 Step 3: Configure Backup Server

📋 Step 4: Create Notification Scripts

📋 Step 5: Configure Service Monitoring

📋 Step 6: Test Basic Failover

📋 Step 7: Advanced Configuration

📋 Step 8: Set Up Split-Brain Prevention

🎮 Practice Exercise

🚨 Troubleshooting Common Issues

Failover Not Working

Split-Brain Situation

Service Won’t Restart

💡 Pro Tips

Tip 1: Email Alerts

Tip 2: Multiple Virtual IPs

Tip 3: Priority Tuning

✅ Monitoring Dashboard

🏆 What You Learned

🎯 What’s Next?

Share this article

Related Articles

🌐 Web Server High Availability: Simple Guide

🌐 Implementing API Gateway on Alpine Linux: Simple Guide

🤖 Configuring System Automation Scripts on Alpine Linux: Simple Guide

Scan QR Code