Let me show you how to set up service failover in Alpine Linux! This ensures your services keep running even when things go wrong. Itโs like having a backup plan for your important services!
๐ค What is Service Failover?
Service failover automatically switches to a backup service when the main one fails. Think of it like having a spare tire - when the main tire goes flat, you can quickly switch to the spare and keep driving. In Alpine Linux, we can set this up easily!
Why use failover?
- Keep services running 24/7
- Minimize downtime
- Automatic recovery
- Better reliability
- Peace of mind
๐ฏ What You Need
Before starting, youโll need:
- Two Alpine Linux servers
- Network connectivity between them
- A service to protect (like nginx)
- Basic terminal knowledge
- About 20 minutes
๐ Step 1: Install Monitoring Tools
First, letโs install what we need:
# On both servers
apk update
# Install keepalived for failover
apk add keepalived
# Install monitoring tools
apk add monit
# Install networking tools
apk add ipvsadm iproute2
# Enable services
rc-update add keepalived
rc-update add monit
๐ Step 2: Configure Primary Server
Set up the main server:
# Create keepalived config
cat > /etc/keepalived/keepalived.conf << 'EOF'
global_defs {
router_id SERVER1
script_user root
enable_script_security
}
vrrp_script check_service {
script "/usr/local/bin/check_service.sh"
interval 2
weight 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass alpine123
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
check_service
}
notify_master "/usr/local/bin/notify_master.sh"
notify_backup "/usr/local/bin/notify_backup.sh"
notify_fault "/usr/local/bin/notify_fault.sh"
}
EOF
# Create service check script
cat > /usr/local/bin/check_service.sh << 'EOF'
#!/bin/sh
# Check if nginx is running
if pgrep nginx > /dev/null; then
exit 0
else
exit 1
fi
EOF
chmod +x /usr/local/bin/check_service.sh
๐ Step 3: Configure Backup Server
Set up the backup server:
# Create keepalived config (on backup server)
cat > /etc/keepalived/keepalived.conf << 'EOF'
global_defs {
router_id SERVER2
script_user root
enable_script_security
}
vrrp_script check_service {
script "/usr/local/bin/check_service.sh"
interval 2
weight 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass alpine123
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
check_service
}
notify_master "/usr/local/bin/notify_master.sh"
notify_backup "/usr/local/bin/notify_backup.sh"
notify_fault "/usr/local/bin/notify_fault.sh"
}
EOF
# Copy check script from primary
# (Same as primary server)
๐ Step 4: Create Notification Scripts
Set up alerts when failover happens:
# Master notification script
cat > /usr/local/bin/notify_master.sh << 'EOF'
#!/bin/sh
echo "$(date): Became MASTER" >> /var/log/keepalived-state.log
# Start services if needed
rc-service nginx start
EOF
# Backup notification script
cat > /usr/local/bin/notify_backup.sh << 'EOF'
#!/bin/sh
echo "$(date): Became BACKUP" >> /var/log/keepalived-state.log
# Optional: Stop non-critical services
EOF
# Fault notification script
cat > /usr/local/bin/notify_fault.sh << 'EOF'
#!/bin/sh
echo "$(date): FAULT detected" >> /var/log/keepalived-state.log
# Alert admin
logger "Keepalived FAULT state detected!"
EOF
# Make scripts executable
chmod +x /usr/local/bin/notify_*.sh
๐ Step 5: Configure Service Monitoring
Set up Monit to restart failed services:
# Configure Monit
cat > /etc/monitrc << 'EOF'
set daemon 30
set log /var/log/monit.log
set httpd port 2812 and
use address localhost
allow localhost
allow admin:monit
check process nginx with pidfile /run/nginx.pid
start program = "/etc/init.d/nginx start"
stop program = "/etc/init.d/nginx stop"
if failed host 127.0.0.1 port 80 protocol http then restart
if 3 restarts within 5 cycles then unmonitor
check process keepalived with pidfile /run/keepalived.pid
start program = "/etc/init.d/keepalived start"
stop program = "/etc/init.d/keepalived stop"
if 3 restarts within 5 cycles then unmonitor
check system $HOST
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if cpu usage > 95% for 10 cycles then alert
if memory usage > 75% then alert
EOF
chmod 600 /etc/monitrc
๐ Step 6: Test Basic Failover
Letโs test if failover works:
# Start services on both servers
rc-service keepalived start
rc-service monit start
rc-service nginx start
# Check virtual IP (on primary)
ip addr show | grep 192.168.1.100
# Test failover
# On primary server:
rc-service nginx stop
# Check if IP moved to backup
# On backup server:
ip addr show | grep 192.168.1.100
# Verify in logs
tail -f /var/log/keepalived-state.log
๐ Step 7: Advanced Configuration
Add more sophisticated checks:
# Enhanced service check
cat > /usr/local/bin/check_service_advanced.sh << 'EOF'
#!/bin/sh
# Multiple service checks
# Check nginx
curl -f -s -o /dev/null http://localhost || exit 1
# Check disk space
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
[ $DISK_USAGE -gt 90 ] && exit 1
# Check load average
LOAD=$(cat /proc/loadavg | awk '{print $1}' | cut -d. -f1)
[ $LOAD -gt 10 ] && exit 1
# All checks passed
exit 0
EOF
chmod +x /usr/local/bin/check_service_advanced.sh
# Update keepalived to use advanced check
sed -i 's/check_service.sh/check_service_advanced.sh/g' /etc/keepalived/keepalived.conf
๐ Step 8: Set Up Split-Brain Prevention
Prevent both servers from becoming master:
# Add fencing script
cat > /usr/local/bin/fence_peer.sh << 'EOF'
#!/bin/sh
# Simple fencing to prevent split-brain
PEER_IP="192.168.1.2" # Other server's IP
VIP="192.168.1.100"
# Check if peer is responding
ping -c 1 -W 1 $PEER_IP > /dev/null 2>&1
PEER_ALIVE=$?
# Check if peer has VIP
ssh -o ConnectTimeout=2 root@$PEER_IP "ip addr show | grep -q $VIP" 2>/dev/null
PEER_HAS_VIP=$?
if [ $PEER_ALIVE -eq 0 ] && [ $PEER_HAS_VIP -eq 0 ]; then
# Peer is alive and has VIP, we should be backup
echo "Peer has VIP, staying as backup"
exit 1
fi
exit 0
EOF
chmod +x /usr/local/bin/fence_peer.sh
๐ฎ Practice Exercise
Try this failover scenario:
- Set up two test services
- Configure failover between them
- Test different failure modes
- Monitor the results
# Practice setup
# Create test service
cat > /usr/local/bin/test_service.sh << 'EOF'
#!/bin/sh
while true; do
echo "Service running on $(hostname)" > /tmp/service.status
sleep 1
done
EOF
chmod +x /usr/local/bin/test_service.sh
# Add to monit
echo "
check process test_service matching test_service.sh
start program = '/usr/local/bin/test_service.sh &'
stop program = 'pkill -f test_service.sh'
" >> /etc/monitrc
๐จ Troubleshooting Common Issues
Failover Not Working
Check these common problems:
# Verify keepalived is running
ps aux | grep keepalived
# Check configuration syntax
keepalived -t
# Look for errors
tail -f /var/log/messages | grep keepalived
# Test network connectivity
ping -c 3 <other_server_ip>
# Check firewall rules
iptables -L -n | grep vrrp
Split-Brain Situation
Fix when both servers think theyโre master:
# Check who has VIP
ip addr show | grep 192.168.1.100
# Force one to backup
rc-service keepalived restart
# Check VRRP communication
tcpdump -i eth0 -n vrrp
# Verify passwords match
grep auth_pass /etc/keepalived/keepalived.conf
Service Wonโt Restart
Debug service monitoring:
# Check monit status
monit status
# Test service manually
/etc/init.d/nginx restart
# Check monit logs
tail -f /var/log/monit.log
# Validate monit config
monit -t
๐ก Pro Tips
Tip 1: Email Alerts
Set up email notifications:
# Add to notify scripts
cat >> /usr/local/bin/notify_master.sh << 'EOF'
# Send email alert
echo "Server $(hostname) is now MASTER" | \
mail -s "Failover Alert" [email protected]
EOF
Tip 2: Multiple Virtual IPs
Handle multiple services:
# Add more VIPs in keepalived.conf
virtual_ipaddress {
192.168.1.100/24
192.168.1.101/24
192.168.1.102/24
}
Tip 3: Priority Tuning
Adjust failover sensitivity:
# In keepalived.conf
vrrp_instance VI_1 {
# Faster failover
advert_int 1
# Preempt settings
preempt_delay 30
# Priority adjustment
priority 100 # Higher = preferred master
}
โ Monitoring Dashboard
Create a simple status page:
# Status check script
cat > /var/www/localhost/htdocs/status.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
<title>Failover Status</title>
<meta http-equiv="refresh" content="5">
</head>
<body>
<h1>Service Failover Status</h1>
<pre id="status"></pre>
<script>
fetch('/status.txt')
.then(r => r.text())
.then(t => document.getElementById('status').textContent = t);
</script>
</body>
</html>
EOF
# Generate status file
cat > /usr/local/bin/update_status.sh << 'EOF'
#!/bin/sh
{
echo "Generated: $(date)"
echo "Hostname: $(hostname)"
echo "Keepalived: $(rc-status | grep keepalived)"
echo "Virtual IPs:"
ip addr show | grep "inet.*scope global"
echo "Service Status:"
rc-status
} > /var/www/localhost/htdocs/status.txt
EOF
chmod +x /usr/local/bin/update_status.sh
# Add to cron
echo "* * * * * /usr/local/bin/update_status.sh" | crontab -
๐ What You Learned
Excellent work! You can now:
- โ Set up keepalived for failover
- โ Configure service monitoring
- โ Handle automatic failover
- โ Prevent split-brain issues
- โ Monitor failover status
Your services are now highly available!
๐ฏ Whatโs Next?
Now that you have failover working, explore:
- Load balancing with HAProxy
- Database replication
- Clustered file systems
- Advanced monitoring with Prometheus
Keep your services running 24/7! ๐