💓 Setting Up Process Health Checks: Simple Guide

Let me show you how to set up process health checks on Alpine Linux! Health checks are like regular doctor visits for your programs - they make sure everything is running smoothly and alert you when something’s wrong. This helps prevent downtime and keeps your services reliable!

🤔 What are Process Health Checks?

Process health checks monitor your running programs to ensure they’re working correctly. They regularly test if services are responding, using reasonable resources, and performing their tasks. If something fails, they can automatically restart the service or notify you!

Why use health checks?

Detect problems early
Automatic recovery from failures
Better service reliability
Reduced manual monitoring
Peace of mind

🎯 What You Need

Before starting, you’ll need:

Alpine Linux installed
Running services to monitor
Root access
Basic command line knowledge
About 20 minutes

📋 Step 1: Install Monitoring Tools

Let’s get the tools we need:

# Update packages
apk update

# Install monit for process monitoring
apk add monit

# Install process utilities
apk add procps htop

# Install notification tools
apk add msmtp mailx curl

# Enable monit service
rc-update add monit
rc-service monit start

# Check monit status
monit -V

📋 Step 2: Configure Basic Health Checks

Set up your first health check:

# Create monit configuration directory
mkdir -p /etc/monit/conf.d

# Configure monit
cat > /etc/monitrc << 'EOF'
# Monit Configuration
set daemon 30  # Check services every 30 seconds
set log /var/log/monit.log

# Web interface (optional)
set httpd port 2812
    allow localhost
    allow admin:monit  # Change password!

# Include service configurations
include /etc/monit/conf.d/*
EOF

# Set proper permissions
chmod 600 /etc/monitrc

# Create first health check - SSH service
cat > /etc/monit/conf.d/sshd << 'EOF'
check process sshd with pidfile /var/run/sshd.pid
    start program = "/etc/init.d/sshd start"
    stop program = "/etc/init.d/sshd stop"
    if failed port 22 protocol ssh then restart
    if 3 restarts within 5 cycles then alert
    if cpu usage > 80% for 2 cycles then alert
    if memory usage > 200 MB then alert
EOF

# Reload monit
monit reload

📋 Step 3: Monitor Web Services

Add health checks for web servers:

# Nginx health check
cat > /etc/monit/conf.d/nginx << 'EOF'
check process nginx with pidfile /var/run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"
    
    # Check if nginx is listening
    if failed host localhost port 80 
        protocol http
        request "/"
        status = 200
        timeout 10 seconds
    then restart
    
    # Resource limits
    if cpu > 60% for 2 cycles then alert
    if memory > 300 MB then alert
    if loadavg(5min) > 4 then alert
    
    # Too many restarts
    if 3 restarts within 5 cycles then unmonitor
EOF

# Apache health check
cat > /etc/monit/conf.d/apache << 'EOF'
check process apache with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start"
    stop program = "/etc/init.d/apache2 stop"
    
    # HTTP check
    if failed host localhost port 80
        protocol http
        request "/server-status"
    then restart
    
    # HTTPS check
    if failed host localhost port 443
        protocol https
    then restart
    
    # Performance checks
    if cpu > 80% for 3 cycles then restart
    if totalmem > 500 MB then restart
    if children > 250 then restart
EOF

# Reload configuration
monit reload

📋 Step 4: Database Health Checks

Monitor database services:

# MySQL/MariaDB health check
cat > /etc/monit/conf.d/mysql << 'EOF'
check process mysql with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mariadb start"
    stop program = "/etc/init.d/mariadb stop"
    
    # Connection test
    if failed unixsocket /var/run/mysqld/mysqld.sock
        protocol mysql
    then restart
    
    # Port test
    if failed port 3306 protocol mysql then restart
    
    # Resource monitoring
    if cpu > 80% for 2 cycles then alert
    if memory > 1 GB then alert
    
    # Connection limit
    if failed host localhost port 3306
        protocol mysql username "monit" password "monitor123"
    then alert
EOF

# PostgreSQL health check
cat > /etc/monit/conf.d/postgresql << 'EOF'
check process postgresql with pidfile /var/run/postgresql/postgresql.pid
    start program = "/etc/init.d/postgresql start"
    stop program = "/etc/init.d/postgresql stop"
    
    if failed port 5432 protocol pgsql then restart
    if cpu > 75% for 2 cycles then alert
    if memory > 800 MB then alert
EOF

# Redis health check
cat > /etc/monit/conf.d/redis << 'EOF'
check process redis with pidfile /var/run/redis.pid
    start program = "/etc/init.d/redis start"
    stop program = "/etc/init.d/redis stop"
    
    if failed host localhost port 6379
        send "PING\r\n"
        expect "PONG"
    then restart
    
    if memory > 2 GB then alert
EOF

📋 Step 5: Custom Application Checks

Create health checks for your apps:

# Node.js application
cat > /etc/monit/conf.d/nodeapp << 'EOF'
check process nodeapp matching "node.*app.js"
    start program = "/usr/bin/npm start" as uid "nodeuser"
    stop program = "/usr/bin/pkill -f 'node.*app.js'"
    
    # HTTP endpoint check
    if failed host localhost port 3000
        protocol http
        request "/health"
        status = 200
        content = "OK"
        timeout 5 seconds
    then restart
    
    # Resource limits
    if cpu > 50% for 3 cycles then restart
    if memory > 500 MB then restart
EOF

# Python application
cat > /etc/monit/conf.d/pythonapp << 'EOF'
check process pythonapp with pidfile /var/run/pythonapp.pid
    start program = "/usr/bin/python3 /opt/app/main.py"
        as uid "appuser" and gid "appgroup"
    stop program = "/bin/kill -TERM `cat /var/run/pythonapp.pid`"
    
    # Custom health endpoint
    if failed host localhost port 8080
        protocol http
        request "/api/health"
        with timeout 10 seconds
    then restart
EOF

# Container health check
cat > /etc/monit/conf.d/docker-app << 'EOF'
check program docker-app with path "/usr/bin/docker inspect myapp"
    if status != 0 then exec "/usr/bin/docker start myapp"
    
check host docker-app-http with address localhost
    if failed port 8080 protocol http then
        exec "/usr/bin/docker restart myapp"
EOF

📋 Step 6: Configure Notifications

Set up alerts for failures:

# Email configuration
cat > /etc/msmtprc << 'EOF'
defaults
auth on
tls on
tls_trust_file /etc/ssl/certs/ca-certificates.crt

account default
host smtp.gmail.com
port 587
from [email protected]
user [email protected]
password your-app-password
EOF

chmod 600 /etc/msmtprc

# Configure monit alerts
cat >> /etc/monitrc << 'EOF'

# Alert settings
set alert [email protected]
set mail-format {
    from: monit@$HOST
    subject: $SERVICE $EVENT at $DATE
    message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}

# Specific alert rules
set alert [email protected] only on { timeout, nonexist }
set alert [email protected] not on { instance, action }
EOF

# Webhook notifications
cat > /etc/monit/conf.d/webhooks << 'EOF'
check program webhook with path "/usr/local/bin/monit-webhook.sh"
    every 5 cycles
    if status != 0 then alert
EOF

# Create webhook script
cat > /usr/local/bin/monit-webhook.sh << 'EOF'
#!/bin/sh
# Send alerts to Slack/Discord/etc

EVENT="$MONIT_EVENT"
SERVICE="$MONIT_SERVICE"
DESCRIPTION="$MONIT_DESCRIPTION"

# Slack webhook
curl -X POST -H 'Content-type: application/json' \
    --data "{\"text\":\"Alert: $SERVICE - $EVENT - $DESCRIPTION\"}" \
    https://hooks.slack.com/services/YOUR/WEBHOOK/URL

# Discord webhook
curl -X POST -H 'Content-type: application/json' \
    --data "{\"content\":\"**Alert:** $SERVICE - $EVENT\\n$DESCRIPTION\"}" \
    https://discord.com/api/webhooks/YOUR/WEBHOOK/URL
EOF

chmod +x /usr/local/bin/monit-webhook.sh

📋 Step 7: Advanced Health Checks

Implement sophisticated monitoring:

# Response time monitoring
cat > /etc/monit/conf.d/response-time << 'EOF'
check host api-response with address localhost
    if failed port 80 protocol http
        request "/api/status"
        with timeout 2 seconds
    then alert
    
    if failed port 80 protocol http
        request "/api/status"
        and response time > 500 milliseconds for 3 cycles
    then alert
EOF

# File system checks
cat > /etc/monit/conf.d/filesystem << 'EOF'
check filesystem rootfs with path /
    if space usage > 80% then alert
    if space usage > 90% then exec "/usr/local/bin/cleanup.sh"
    if inode usage > 80% then alert

check filesystem logs with path /var/log
    if space usage > 5 GB then exec "/usr/bin/find /var/log -name '*.gz' -delete"
EOF

# Network connectivity
cat > /etc/monit/conf.d/network << 'EOF'
check host google with address google.com
    if failed ping count 3 within 5 cycles then alert
    
check network eth0 with interface eth0
    if failed link then alert
    if changed link then alert
    if saturation > 90% then alert
EOF

# Process dependencies
cat > /etc/monit/conf.d/dependencies << 'EOF'
check process app depends on database, cache
    with pidfile /var/run/app.pid
    start program = "/etc/init.d/app start"
    stop program = "/etc/init.d/app stop"
    
check process database with pidfile /var/run/mysql.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"
    
check process cache with pidfile /var/run/redis.pid
    start program = "/etc/init.d/redis start"
    stop program = "/etc/init.d/redis stop"
EOF

📋 Step 8: Monitoring Dashboard

Create a status dashboard:

# Status script
cat > /usr/local/bin/health-status.sh << 'EOF'
#!/bin/sh
# Process Health Status Dashboard

echo "💓 Process Health Status"
echo "======================"
echo ""

# Get monit summary
echo "📊 Service Status:"
monit summary | tail -n +3 | while read line; do
    service=$(echo $line | awk '{print $1}')
    status=$(echo $line | awk '{print $2}')
    
    case $status in
        "OK"|"Running")
            echo "  ✅ $service - Healthy"
            ;;
        "Not")
            echo "  ❌ $service - Not monitored"
            ;;
        *)
            echo "  ⚠️  $service - $status"
            ;;
    esac
done

echo ""
echo "📈 System Resources:"
echo "  CPU Load: $(uptime | awk -F'load average:' '{print $2}')"
echo "  Memory: $(free -h | awk '/^Mem:/ {print $3 " / " $2}')"
echo "  Disk: $(df -h / | awk 'NR==2 {print $3 " / " $2 " (" $5 ")"}')"

echo ""
echo "📝 Recent Alerts:"
tail -5 /var/log/monit.log | grep -E "error|alert|restart" || echo "  No recent alerts"

echo ""
echo "🔄 Service Uptimes:"
for pid in $(ls /var/run/*.pid 2>/dev/null); do
    if [ -f "$pid" ]; then
        pidnum=$(cat $pid)
        service=$(basename $pid .pid)
        if [ -d "/proc/$pidnum" ]; then
            uptime=$(ps -o etime= -p $pidnum 2>/dev/null | xargs)
            echo "  $service: $uptime"
        fi
    fi
done
EOF

chmod +x /usr/local/bin/health-status.sh

# Web dashboard
cat > /var/www/health/index.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
    <title>Health Monitor</title>
    <meta http-equiv="refresh" content="30">
    <style>
        body { font-family: Arial; margin: 20px; background: #f0f0f0; }
        .container { max-width: 800px; margin: 0 auto; }
        .status { 
            background: white; 
            padding: 15px; 
            margin: 10px 0; 
            border-radius: 5px;
            box-shadow: 0 2px 5px rgba(0,0,0,0.1);
        }
        .healthy { border-left: 5px solid #4CAF50; }
        .warning { border-left: 5px solid #FFC107; }
        .error { border-left: 5px solid #F44336; }
        h1 { text-align: center; }
    </style>
</head>
<body>
    <div class="container">
        <h1>💓 Service Health Monitor</h1>
        <div id="status">Loading...</div>
    </div>
    <script>
        // Auto-refresh status
        setInterval(() => location.reload(), 30000);
    </script>
</body>
</html>
EOF

🎮 Practice Exercise

Try this monitoring setup:

Install a web server
Create health check
Test failure recovery
Monitor the logs

# Install test service
apk add lighttpd
rc-service lighttpd start

# Create health check
cat > /etc/monit/conf.d/lighttpd << 'EOF'
check process lighttpd with pidfile /var/run/lighttpd.pid
    start program = "/etc/init.d/lighttpd start"
    stop program = "/etc/init.d/lighttpd stop"
    if failed port 80 protocol http then restart
    if 3 restarts within 5 cycles then unmonitor
EOF

# Test failure
rc-service lighttpd stop
sleep 35  # Wait for monit to detect

# Check if restarted
rc-service lighttpd status
tail /var/log/monit.log

🚨 Troubleshooting Common Issues

Service Won’t Start

Debug startup issues:

# Check monit log
tail -f /var/log/monit.log

# Test command manually
/etc/init.d/service-name start

# Check permissions
ls -la /etc/init.d/service-name

# Verify PID file location
ls -la /var/run/

False Positives

Tune your checks:

# Increase timeout
if failed port 80 protocol http
    with timeout 30 seconds
then restart

# Add retry logic
if failed port 80 protocol http
    for 3 cycles
then restart

# Adjust thresholds
if cpu > 90% for 5 cycles then alert

Not Receiving Alerts

Fix notification issues:

# Test email
echo "Test" | mail -s "Test Alert" [email protected]

# Check mail configuration
cat /var/log/mail.log

# Test monit alerts
monit -v validate

# Force alert test
monit alert

💡 Pro Tips

Tip 1: Group Monitoring

Organize related services:

cat > /etc/monit/conf.d/groups << 'EOF'
group web nginx, apache, php-fpm
group database mysql, postgresql, redis
group apps nodeapp, pythonapp

check process nginx with pidfile /var/run/nginx.pid
    group web
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"
EOF

# Control by group
monit -g web restart all
monit -g database status

Tip 2: Custom Scripts

Create specific health checks:

cat > /usr/local/bin/check-api.sh << 'EOF'
#!/bin/sh
# Custom API health check

response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost/api/health)
if [ "$response" != "200" ]; then
    exit 1
fi

# Check response time
time=$(curl -s -o /dev/null -w "%{time_total}" http://localhost/api/health)
if [ $(echo "$time > 1.0" | bc) -eq 1 ]; then
    exit 2
fi

exit 0
EOF

chmod +x /usr/local/bin/check-api.sh

# Use in monit
check program api-health with path "/usr/local/bin/check-api.sh"
    if status != 0 then restart

Tip 3: Gradual Degradation

Handle overload gracefully:

# Reduce service under load
check process app with pidfile /var/run/app.pid
    start program = "/etc/init.d/app start"
    stop program = "/etc/init.d/app stop"
    
    if cpu > 60% for 2 cycles then
        exec "/usr/local/bin/reduce-workers.sh"
    
    if cpu > 80% for 3 cycles then
        exec "/usr/local/bin/enable-cache.sh"
    
    if cpu > 95% for 5 cycles then restart

✅ Best Practices

Start simple
- Basic checks first
- Add complexity gradually
- Test each check

Avoid alert fatigue

# Good thresholds
if cpu > 80% for 3 cycles then alert
# Not: if cpu > 50% then alert

Use dependencies
```
check process app depends on database
```

Log everything

set log /var/log/monit.log
set eventqueue basedir /var/monit

Regular reviews

# Weekly check
0 9 * * 1 /usr/local/bin/health-report.sh

🏆 What You Learned

Great job! You can now:

✅ Install and configure Monit
✅ Create process health checks
✅ Monitor system resources
✅ Set up automatic recovery
✅ Configure alert notifications

Your services now have a health monitoring system!

🎯 What’s Next?

Now that you have health checks, explore:

Distributed monitoring with Prometheus
Log aggregation with ELK stack
Performance monitoring with Grafana
Incident management systems

Keep your services healthy! 💓

💓 Setting Up Process Health Checks: Simple Guide

Table of Contents

🤔 What are Process Health Checks?

🎯 What You Need

📋 Step 1: Install Monitoring Tools

📋 Step 2: Configure Basic Health Checks

📋 Step 3: Monitor Web Services

📋 Step 4: Database Health Checks

📋 Step 5: Custom Application Checks

📋 Step 6: Configure Notifications

📋 Step 7: Advanced Health Checks

📋 Step 8: Monitoring Dashboard

🎮 Practice Exercise

🚨 Troubleshooting Common Issues

Service Won’t Start

False Positives

Not Receiving Alerts

💡 Pro Tips

Tip 1: Group Monitoring

Tip 2: Custom Scripts

Tip 3: Gradual Degradation

✅ Best Practices

🏆 What You Learned

🎯 What’s Next?

Share this article

💓 Setting Up Process Health Checks: Simple Guide

Table of Contents

🤔 What are Process Health Checks?

🎯 What You Need

📋 Step 1: Install Monitoring Tools

📋 Step 2: Configure Basic Health Checks

📋 Step 3: Monitor Web Services

📋 Step 4: Database Health Checks

📋 Step 5: Custom Application Checks

📋 Step 6: Configure Notifications

📋 Step 7: Advanced Health Checks

📋 Step 8: Monitoring Dashboard

🎮 Practice Exercise

🚨 Troubleshooting Common Issues

Service Won’t Start

False Positives

Not Receiving Alerts

💡 Pro Tips

Tip 1: Group Monitoring

Tip 2: Custom Scripts

Tip 3: Gradual Degradation

✅ Best Practices

🏆 What You Learned

🎯 What’s Next?

Share this article

Related Articles

📊 Setting Up System Resource Monitoring on Alpine Linux: Simple Guide

🏥 Setting Up Process Health Checks on Alpine Linux: Simple Guide

🔍 Configuring System Security Monitoring: Simple Guide

Scan QR Code