Let me show you how to set up process health checks on Alpine Linux! Health checks are like regular doctor visits for your programs - they make sure everything is running smoothly and alert you when somethingโs wrong. This helps prevent downtime and keeps your services reliable!
๐ค What are Process Health Checks?
Process health checks monitor your running programs to ensure theyโre working correctly. They regularly test if services are responding, using reasonable resources, and performing their tasks. If something fails, they can automatically restart the service or notify you!
Why use health checks?
- Detect problems early
- Automatic recovery from failures
- Better service reliability
- Reduced manual monitoring
- Peace of mind
๐ฏ What You Need
Before starting, youโll need:
- Alpine Linux installed
- Running services to monitor
- Root access
- Basic command line knowledge
- About 20 minutes
๐ Step 1: Install Monitoring Tools
Letโs get the tools we need:
# Update packages
apk update
# Install monit for process monitoring
apk add monit
# Install process utilities
apk add procps htop
# Install notification tools
apk add msmtp mailx curl
# Enable monit service
rc-update add monit
rc-service monit start
# Check monit status
monit -V
๐ Step 2: Configure Basic Health Checks
Set up your first health check:
# Create monit configuration directory
mkdir -p /etc/monit/conf.d
# Configure monit
cat > /etc/monitrc << 'EOF'
# Monit Configuration
set daemon 30 # Check services every 30 seconds
set log /var/log/monit.log
# Web interface (optional)
set httpd port 2812
allow localhost
allow admin:monit # Change password!
# Include service configurations
include /etc/monit/conf.d/*
EOF
# Set proper permissions
chmod 600 /etc/monitrc
# Create first health check - SSH service
cat > /etc/monit/conf.d/sshd << 'EOF'
check process sshd with pidfile /var/run/sshd.pid
start program = "/etc/init.d/sshd start"
stop program = "/etc/init.d/sshd stop"
if failed port 22 protocol ssh then restart
if 3 restarts within 5 cycles then alert
if cpu usage > 80% for 2 cycles then alert
if memory usage > 200 MB then alert
EOF
# Reload monit
monit reload
๐ Step 3: Monitor Web Services
Add health checks for web servers:
# Nginx health check
cat > /etc/monit/conf.d/nginx << 'EOF'
check process nginx with pidfile /var/run/nginx.pid
start program = "/etc/init.d/nginx start"
stop program = "/etc/init.d/nginx stop"
# Check if nginx is listening
if failed host localhost port 80
protocol http
request "/"
status = 200
timeout 10 seconds
then restart
# Resource limits
if cpu > 60% for 2 cycles then alert
if memory > 300 MB then alert
if loadavg(5min) > 4 then alert
# Too many restarts
if 3 restarts within 5 cycles then unmonitor
EOF
# Apache health check
cat > /etc/monit/conf.d/apache << 'EOF'
check process apache with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
# HTTP check
if failed host localhost port 80
protocol http
request "/server-status"
then restart
# HTTPS check
if failed host localhost port 443
protocol https
then restart
# Performance checks
if cpu > 80% for 3 cycles then restart
if totalmem > 500 MB then restart
if children > 250 then restart
EOF
# Reload configuration
monit reload
๐ Step 4: Database Health Checks
Monitor database services:
# MySQL/MariaDB health check
cat > /etc/monit/conf.d/mysql << 'EOF'
check process mysql with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mariadb start"
stop program = "/etc/init.d/mariadb stop"
# Connection test
if failed unixsocket /var/run/mysqld/mysqld.sock
protocol mysql
then restart
# Port test
if failed port 3306 protocol mysql then restart
# Resource monitoring
if cpu > 80% for 2 cycles then alert
if memory > 1 GB then alert
# Connection limit
if failed host localhost port 3306
protocol mysql username "monit" password "monitor123"
then alert
EOF
# PostgreSQL health check
cat > /etc/monit/conf.d/postgresql << 'EOF'
check process postgresql with pidfile /var/run/postgresql/postgresql.pid
start program = "/etc/init.d/postgresql start"
stop program = "/etc/init.d/postgresql stop"
if failed port 5432 protocol pgsql then restart
if cpu > 75% for 2 cycles then alert
if memory > 800 MB then alert
EOF
# Redis health check
cat > /etc/monit/conf.d/redis << 'EOF'
check process redis with pidfile /var/run/redis.pid
start program = "/etc/init.d/redis start"
stop program = "/etc/init.d/redis stop"
if failed host localhost port 6379
send "PING\r\n"
expect "PONG"
then restart
if memory > 2 GB then alert
EOF
๐ Step 5: Custom Application Checks
Create health checks for your apps:
# Node.js application
cat > /etc/monit/conf.d/nodeapp << 'EOF'
check process nodeapp matching "node.*app.js"
start program = "/usr/bin/npm start" as uid "nodeuser"
stop program = "/usr/bin/pkill -f 'node.*app.js'"
# HTTP endpoint check
if failed host localhost port 3000
protocol http
request "/health"
status = 200
content = "OK"
timeout 5 seconds
then restart
# Resource limits
if cpu > 50% for 3 cycles then restart
if memory > 500 MB then restart
EOF
# Python application
cat > /etc/monit/conf.d/pythonapp << 'EOF'
check process pythonapp with pidfile /var/run/pythonapp.pid
start program = "/usr/bin/python3 /opt/app/main.py"
as uid "appuser" and gid "appgroup"
stop program = "/bin/kill -TERM `cat /var/run/pythonapp.pid`"
# Custom health endpoint
if failed host localhost port 8080
protocol http
request "/api/health"
with timeout 10 seconds
then restart
EOF
# Container health check
cat > /etc/monit/conf.d/docker-app << 'EOF'
check program docker-app with path "/usr/bin/docker inspect myapp"
if status != 0 then exec "/usr/bin/docker start myapp"
check host docker-app-http with address localhost
if failed port 8080 protocol http then
exec "/usr/bin/docker restart myapp"
EOF
๐ Step 6: Configure Notifications
Set up alerts for failures:
# Email configuration
cat > /etc/msmtprc << 'EOF'
defaults
auth on
tls on
tls_trust_file /etc/ssl/certs/ca-certificates.crt
account default
host smtp.gmail.com
port 587
from [email protected]
user [email protected]
password your-app-password
EOF
chmod 600 /etc/msmtprc
# Configure monit alerts
cat >> /etc/monitrc << 'EOF'
# Alert settings
set alert [email protected]
set mail-format {
from: monit@$HOST
subject: $SERVICE $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}
# Specific alert rules
set alert [email protected] only on { timeout, nonexist }
set alert [email protected] not on { instance, action }
EOF
# Webhook notifications
cat > /etc/monit/conf.d/webhooks << 'EOF'
check program webhook with path "/usr/local/bin/monit-webhook.sh"
every 5 cycles
if status != 0 then alert
EOF
# Create webhook script
cat > /usr/local/bin/monit-webhook.sh << 'EOF'
#!/bin/sh
# Send alerts to Slack/Discord/etc
EVENT="$MONIT_EVENT"
SERVICE="$MONIT_SERVICE"
DESCRIPTION="$MONIT_DESCRIPTION"
# Slack webhook
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"Alert: $SERVICE - $EVENT - $DESCRIPTION\"}" \
https://hooks.slack.com/services/YOUR/WEBHOOK/URL
# Discord webhook
curl -X POST -H 'Content-type: application/json' \
--data "{\"content\":\"**Alert:** $SERVICE - $EVENT\\n$DESCRIPTION\"}" \
https://discord.com/api/webhooks/YOUR/WEBHOOK/URL
EOF
chmod +x /usr/local/bin/monit-webhook.sh
๐ Step 7: Advanced Health Checks
Implement sophisticated monitoring:
# Response time monitoring
cat > /etc/monit/conf.d/response-time << 'EOF'
check host api-response with address localhost
if failed port 80 protocol http
request "/api/status"
with timeout 2 seconds
then alert
if failed port 80 protocol http
request "/api/status"
and response time > 500 milliseconds for 3 cycles
then alert
EOF
# File system checks
cat > /etc/monit/conf.d/filesystem << 'EOF'
check filesystem rootfs with path /
if space usage > 80% then alert
if space usage > 90% then exec "/usr/local/bin/cleanup.sh"
if inode usage > 80% then alert
check filesystem logs with path /var/log
if space usage > 5 GB then exec "/usr/bin/find /var/log -name '*.gz' -delete"
EOF
# Network connectivity
cat > /etc/monit/conf.d/network << 'EOF'
check host google with address google.com
if failed ping count 3 within 5 cycles then alert
check network eth0 with interface eth0
if failed link then alert
if changed link then alert
if saturation > 90% then alert
EOF
# Process dependencies
cat > /etc/monit/conf.d/dependencies << 'EOF'
check process app depends on database, cache
with pidfile /var/run/app.pid
start program = "/etc/init.d/app start"
stop program = "/etc/init.d/app stop"
check process database with pidfile /var/run/mysql.pid
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
check process cache with pidfile /var/run/redis.pid
start program = "/etc/init.d/redis start"
stop program = "/etc/init.d/redis stop"
EOF
๐ Step 8: Monitoring Dashboard
Create a status dashboard:
# Status script
cat > /usr/local/bin/health-status.sh << 'EOF'
#!/bin/sh
# Process Health Status Dashboard
echo "๐ Process Health Status"
echo "======================"
echo ""
# Get monit summary
echo "๐ Service Status:"
monit summary | tail -n +3 | while read line; do
service=$(echo $line | awk '{print $1}')
status=$(echo $line | awk '{print $2}')
case $status in
"OK"|"Running")
echo " โ
$service - Healthy"
;;
"Not")
echo " โ $service - Not monitored"
;;
*)
echo " โ ๏ธ $service - $status"
;;
esac
done
echo ""
echo "๐ System Resources:"
echo " CPU Load: $(uptime | awk -F'load average:' '{print $2}')"
echo " Memory: $(free -h | awk '/^Mem:/ {print $3 " / " $2}')"
echo " Disk: $(df -h / | awk 'NR==2 {print $3 " / " $2 " (" $5 ")"}')"
echo ""
echo "๐ Recent Alerts:"
tail -5 /var/log/monit.log | grep -E "error|alert|restart" || echo " No recent alerts"
echo ""
echo "๐ Service Uptimes:"
for pid in $(ls /var/run/*.pid 2>/dev/null); do
if [ -f "$pid" ]; then
pidnum=$(cat $pid)
service=$(basename $pid .pid)
if [ -d "/proc/$pidnum" ]; then
uptime=$(ps -o etime= -p $pidnum 2>/dev/null | xargs)
echo " $service: $uptime"
fi
fi
done
EOF
chmod +x /usr/local/bin/health-status.sh
# Web dashboard
cat > /var/www/health/index.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
<title>Health Monitor</title>
<meta http-equiv="refresh" content="30">
<style>
body { font-family: Arial; margin: 20px; background: #f0f0f0; }
.container { max-width: 800px; margin: 0 auto; }
.status {
background: white;
padding: 15px;
margin: 10px 0;
border-radius: 5px;
box-shadow: 0 2px 5px rgba(0,0,0,0.1);
}
.healthy { border-left: 5px solid #4CAF50; }
.warning { border-left: 5px solid #FFC107; }
.error { border-left: 5px solid #F44336; }
h1 { text-align: center; }
</style>
</head>
<body>
<div class="container">
<h1>๐ Service Health Monitor</h1>
<div id="status">Loading...</div>
</div>
<script>
// Auto-refresh status
setInterval(() => location.reload(), 30000);
</script>
</body>
</html>
EOF
๐ฎ Practice Exercise
Try this monitoring setup:
- Install a web server
- Create health check
- Test failure recovery
- Monitor the logs
# Install test service
apk add lighttpd
rc-service lighttpd start
# Create health check
cat > /etc/monit/conf.d/lighttpd << 'EOF'
check process lighttpd with pidfile /var/run/lighttpd.pid
start program = "/etc/init.d/lighttpd start"
stop program = "/etc/init.d/lighttpd stop"
if failed port 80 protocol http then restart
if 3 restarts within 5 cycles then unmonitor
EOF
# Test failure
rc-service lighttpd stop
sleep 35 # Wait for monit to detect
# Check if restarted
rc-service lighttpd status
tail /var/log/monit.log
๐จ Troubleshooting Common Issues
Service Wonโt Start
Debug startup issues:
# Check monit log
tail -f /var/log/monit.log
# Test command manually
/etc/init.d/service-name start
# Check permissions
ls -la /etc/init.d/service-name
# Verify PID file location
ls -la /var/run/
False Positives
Tune your checks:
# Increase timeout
if failed port 80 protocol http
with timeout 30 seconds
then restart
# Add retry logic
if failed port 80 protocol http
for 3 cycles
then restart
# Adjust thresholds
if cpu > 90% for 5 cycles then alert
Not Receiving Alerts
Fix notification issues:
# Test email
echo "Test" | mail -s "Test Alert" [email protected]
# Check mail configuration
cat /var/log/mail.log
# Test monit alerts
monit -v validate
# Force alert test
monit alert
๐ก Pro Tips
Tip 1: Group Monitoring
Organize related services:
cat > /etc/monit/conf.d/groups << 'EOF'
group web nginx, apache, php-fpm
group database mysql, postgresql, redis
group apps nodeapp, pythonapp
check process nginx with pidfile /var/run/nginx.pid
group web
start program = "/etc/init.d/nginx start"
stop program = "/etc/init.d/nginx stop"
EOF
# Control by group
monit -g web restart all
monit -g database status
Tip 2: Custom Scripts
Create specific health checks:
cat > /usr/local/bin/check-api.sh << 'EOF'
#!/bin/sh
# Custom API health check
response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost/api/health)
if [ "$response" != "200" ]; then
exit 1
fi
# Check response time
time=$(curl -s -o /dev/null -w "%{time_total}" http://localhost/api/health)
if [ $(echo "$time > 1.0" | bc) -eq 1 ]; then
exit 2
fi
exit 0
EOF
chmod +x /usr/local/bin/check-api.sh
# Use in monit
check program api-health with path "/usr/local/bin/check-api.sh"
if status != 0 then restart
Tip 3: Gradual Degradation
Handle overload gracefully:
# Reduce service under load
check process app with pidfile /var/run/app.pid
start program = "/etc/init.d/app start"
stop program = "/etc/init.d/app stop"
if cpu > 60% for 2 cycles then
exec "/usr/local/bin/reduce-workers.sh"
if cpu > 80% for 3 cycles then
exec "/usr/local/bin/enable-cache.sh"
if cpu > 95% for 5 cycles then restart
โ Best Practices
-
Start simple
- Basic checks first
- Add complexity gradually
- Test each check
-
Avoid alert fatigue
# Good thresholds if cpu > 80% for 3 cycles then alert # Not: if cpu > 50% then alert
-
Use dependencies
check process app depends on database
-
Log everything
set log /var/log/monit.log set eventqueue basedir /var/monit
-
Regular reviews
# Weekly check 0 9 * * 1 /usr/local/bin/health-report.sh
๐ What You Learned
Great job! You can now:
- โ Install and configure Monit
- โ Create process health checks
- โ Monitor system resources
- โ Set up automatic recovery
- โ Configure alert notifications
Your services now have a health monitoring system!
๐ฏ Whatโs Next?
Now that you have health checks, explore:
- Distributed monitoring with Prometheus
- Log aggregation with ELK stack
- Performance monitoring with Grafana
- Incident management systems
Keep your services healthy! ๐