Alpine Linux Performance Monitoring: Complete Setup Guide
Performance monitoring is crucial for any Alpine Linux system, whether it’s a tiny container or a production server. I’ll show you how to set up proper monitoring that actually helps you spot problems before they become disasters.
Introduction
Alpine Linux’s minimal nature means you need to be smart about monitoring - you want comprehensive visibility without bloating your system. The tools I’ll cover give you everything you need while staying true to Alpine’s lightweight philosophy.
Why You Need This
- Catch performance problems before users notice them
- Understand your system’s resource usage patterns
- Get alerts when things go wrong
- Make data-driven decisions about scaling and optimization
Prerequisites
You’ll need these things first:
- Alpine Linux system with root access
- At least 512MB available RAM for monitoring tools
- Network connectivity for installing packages
- Basic understanding of system metrics (CPU, memory, disk)
Step 1: Installing Core Monitoring Tools
Setting Up Essential System Monitoring
Let’s start with the basic tools that every Alpine Linux system should have.
What we’re doing: Installing fundamental monitoring utilities and system information tools.
# Update package repositories
apk update
# Install basic monitoring tools
apk add htop iotop nethogs sysstat procps
# Install additional system utilities
apk add lsof strace tcpdump
Code explanation:
htop
: Interactive process viewer with better interface than topiotop
: Shows disk I/O usage by processnethogs
: Network usage monitor by processsysstat
: Collection of system performance tools (iostat, vmstat, etc.)procps
: Process management utilitieslsof
: Shows which files are open by which processes
Expected Output:
(1/12) Installing htop (3.2.1-r1)
(2/12) Installing iotop (0.6-r3)
...
OK: 145 MiB in 98 packages
What this means: You now have essential monitoring tools for immediate system analysis.
Configuring System Statistics Collection
What we’re doing: Setting up automated collection of system performance data.
# Enable system activity data collection
rc-service sysstat start
rc-update add sysstat
# Configure data collection interval (every 10 minutes)
echo "*/10 * * * * /usr/lib/sysstat/sa1 1 1" >> /etc/crontabs/root
# Generate daily reports
echo "53 23 * * * /usr/lib/sysstat/sa2 -A" >> /etc/crontabs/root
# Restart cron to apply changes
rc-service crond restart
Code explanation:
rc-service sysstat start
: Starts the system activity reportersa1 1 1
: Collects system activity data (1 sample, 1 second apart)sa2 -A
: Generates comprehensive daily reports- Cron jobs automate regular data collection
Tip: The collected data is stored in
/var/log/sysstat/
and can be analyzed with sar commands.
Step 2: Setting Up Prometheus and Node Exporter
Installing Prometheus Node Exporter
For more advanced monitoring, let’s set up Prometheus exporters to collect detailed metrics.
What we’re doing: Installing Node Exporter to collect system metrics for Prometheus.
# Install Node Exporter
apk add prometheus-node-exporter
# Configure Node Exporter
cat > /etc/conf.d/prometheus-node-exporter << 'EOF'
ARGS="--web.listen-address=:9100 --path.rootfs=/host"
EOF
# Create systemd override directory
mkdir -p /etc/systemd/system/prometheus-node-exporter.service.d
# Start and enable Node Exporter
rc-service prometheus-node-exporter start
rc-update add prometheus-node-exporter
Code explanation:
prometheus-node-exporter
: Exports system metrics in Prometheus format--web.listen-address=:9100
: Sets the port for metrics endpoint--path.rootfs=/host
: Sets the root filesystem path for accurate metrics
Testing Node Exporter
What we’re doing: Verifying that Node Exporter is collecting and exposing metrics correctly.
# Check if Node Exporter is running
rc-service prometheus-node-exporter status
# Test metrics endpoint
curl http://localhost:9100/metrics | head -20
# Check specific CPU metrics
curl -s http://localhost:9100/metrics | grep "node_cpu_seconds_total"
Code explanation:
rc-service status
: Shows the current status of the servicecurl http://localhost:9100/metrics
: Fetches metrics in Prometheus formatgrep "node_cpu_seconds_total"
: Filters for CPU-specific metrics
Expected Output:
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 12345.67
node_cpu_seconds_total{cpu="0",mode="system"} 89.12
What this means: Node Exporter is successfully collecting and exposing detailed system metrics.
Step 3: Lightweight Dashboard Setup
Installing and Configuring Netdata
Netdata provides real-time monitoring with a web interface that’s perfect for Alpine Linux.
What we’re doing: Setting up Netdata for real-time system monitoring with minimal resource usage.
# Install Netdata
apk add netdata
# Configure Netdata for minimal resource usage
cat > /etc/netdata/netdata.conf << 'EOF'
[global]
memory mode = dbengine
page cache size = 32
dbengine disk space = 256
[web]
web files owner = root
web files group = root
bind to = *:19999
[plugins]
enable running new plugins = no
check for new plugins every = 60
EOF
# Start and enable Netdata
rc-service netdata start
rc-update add netdata
Configuration explanation:
memory mode = dbengine
: Uses efficient database engine for storing metricspage cache size = 32
: Limits memory usage to 32MBdbengine disk space = 256
: Limits disk usage to 256MBbind to = *:19999
: Makes Netdata accessible on port 19999
Customizing Netdata Monitoring
What we’re doing: Configuring specific monitoring modules and alerts for your Alpine system.
# Enable specific monitoring modules
cat > /etc/netdata/go.d.conf << 'EOF'
modules:
nginx: yes
apache: yes
mysql: yes
redis: yes
docker: yes
EOF
# Configure basic alerting
cat > /etc/netdata/health.d/custom.conf << 'EOF'
# CPU usage alert
alarm: cpu_usage
on: system.cpu
lookup: average -3m unaligned of user,system,softirq,irq,guest
every: 10s
units: %
warn: $this > 75
crit: $this > 90
info: CPU utilization is too high
# Memory usage alert
alarm: ram_usage
on: system.ram
lookup: average -1m unaligned of used
every: 10s
units: %
warn: $this > 80
crit: $this > 95
info: Memory usage is too high
EOF
# Restart Netdata to apply configuration
rc-service netdata restart
Configuration explanation:
modules
: Enables monitoring for specific servicesalarm
: Defines alert conditionslookup: average -3m
: Checks 3-minute average valueswarn/crit
: Sets warning and critical thresholds
Step 4: Command-Line Monitoring Tools
Real-Time System Monitoring
What we’re doing: Using command-line tools for immediate system analysis and troubleshooting.
# Monitor CPU and memory usage
htop
# Check disk I/O by process
iotop -o
# Monitor network usage by process
nethogs
# System activity report
sar -u 1 5 # CPU usage every second for 5 times
sar -r 1 5 # Memory usage
sar -d 1 5 # Disk activity
Code explanation:
htop
: Interactive process monitor with sorting and filteringiotop -o
: Shows only processes with active I/Onethogs
: Real-time network usage per processsar -u 1 5
: Shows CPU utilization statistics
Advanced Performance Analysis
What we’re doing: Using specialized tools for detailed performance investigation.
# Check what files processes are using
lsof | head -20
# Monitor system calls made by a process
strace -p PID
# Network packet analysis
tcpdump -i any -n -c 10
# Disk usage analysis
df -h
du -sh /var/log/*
# Memory usage breakdown
cat /proc/meminfo
free -h
Code explanation:
lsof
: Lists open files and network connectionsstrace -p PID
: Traces system calls for a specific processtcpdump -i any -n -c 10
: Captures 10 network packetsdf -h
: Shows disk space usage in human-readable formatdu -sh
: Shows directory sizes
Practical Examples
Example 1: Monitoring a Web Server
What we’re doing: Setting up comprehensive monitoring for an Nginx web server.
# Install and configure Nginx with monitoring
apk add nginx
# Create Nginx status page for monitoring
cat > /etc/nginx/conf.d/status.conf << 'EOF'
server {
listen 8080;
server_name localhost;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
}
EOF
# Configure Netdata to monitor Nginx
cat > /etc/netdata/go.d/nginx.conf << 'EOF'
jobs:
- name: local
url: http://127.0.0.1:8080/nginx_status
EOF
# Restart services
rc-service nginx restart
rc-service netdata restart
# Test monitoring
curl http://localhost:8080/nginx_status
Code explanation:
stub_status on
: Enables Nginx status moduleallow 127.0.0.1
: Restricts access to localhost only- Netdata automatically collects Nginx metrics from the status page
Example 2: Container Monitoring Setup
What we’re doing: Monitoring Docker containers running on Alpine Linux.
# Install Docker
apk add docker
# Start Docker service
rc-service docker start
rc-update add docker
# Install cAdvisor for container monitoring
docker run -d \
--name=cadvisor \
--restart=always \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
gcr.io/cadvisor/cadvisor:latest
# Check container metrics
curl http://localhost:8080/metrics | grep container_cpu
Code explanation:
cAdvisor
: Google’s container monitoring tool- Volume mounts provide access to system and Docker information
- Exposes container metrics on port 8080
Advanced Monitoring Configuration
Setting Up Log Monitoring
What we’re doing: Monitoring system and application logs for errors and patterns.
# Install log monitoring tools
apk add logrotate rsyslog
# Configure centralized logging
cat > /etc/rsyslog.conf << 'EOF'
*.info;mail.none;authpriv.none;cron.none /var/log/messages
authpriv.* /var/log/secure
mail.* /var/log/maillog
cron.* /var/log/cron
*.emerg *
uucp,news.crit /var/log/spooler
local7.* /var/log/boot.log
EOF
# Set up log rotation
cat > /etc/logrotate.d/syslog << 'EOF'
/var/log/messages /var/log/secure /var/log/maillog /var/log/cron {
daily
missingok
rotate 30
compress
delaycompress
postrotate
/bin/kill -HUP `cat /var/run/rsyslogd.pid 2> /dev/null` 2> /dev/null || true
endscript
}
EOF
# Start logging services
rc-service rsyslog start
rc-update add rsyslog
Configuration explanation:
*.info
: Logs info level and above to messagesauthpriv.*
: Authentication logs to securerotate 30
: Keeps 30 days of old logscompress
: Compresses old log files
Custom Monitoring Scripts
What we’re doing: Creating custom scripts for application-specific monitoring.
# Create custom monitoring script
cat > /usr/local/bin/system-health.sh << 'EOF'
#!/bin/sh
# System health monitoring script
DATE=$(date)
CPU_USAGE=$(top -bn1 | grep "CPU:" | awk '{print $2}' | cut -d'%' -f1)
MEM_USAGE=$(free | grep Mem | awk '{printf "%.2f", $3/$2 * 100.0}')
DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | cut -d'%' -f1)
# Log system stats
echo "$DATE - CPU: ${CPU_USAGE}%, Memory: ${MEM_USAGE}%, Disk: ${DISK_USAGE}%" >> /var/log/system-health.log
# Alert if any metric is too high
if [ "$CPU_USAGE" -gt 80 ] || [ "$MEM_USAGE" -gt 80 ] || [ "$DISK_USAGE" -gt 85 ]; then
echo "ALERT: High resource usage detected at $DATE" | logger -t system-health
fi
EOF
# Make script executable
chmod +x /usr/local/bin/system-health.sh
# Add to cron for regular execution
echo "*/5 * * * * /usr/local/bin/system-health.sh" >> /etc/crontabs/root
Code explanation:
- Script collects CPU, memory, and disk usage metrics
- Logs data to
/var/log/system-health.log
- Sends alerts via syslog when thresholds are exceeded
- Runs every 5 minutes via cron
Troubleshooting
Common Issue 1: High Memory Usage from Monitoring
Problem: Monitoring tools consuming too much memory Solution: Optimize monitoring tool configurations
# Reduce Netdata memory usage
cat >> /etc/netdata/netdata.conf << 'EOF'
[global]
page cache size = 16
dbengine disk space = 128
[plugins]
apps = no
charts.d = no
python.d = no
EOF
# Restart Netdata
rc-service netdata restart
Common Issue 2: Missing Metrics
Problem: Some system metrics not appearing Solution: Check kernel modules and permissions
# Check if required kernel modules are loaded
lsmod | grep -E "(hwmon|thermal|cpufreq)"
# Load missing modules
modprobe coretemp
modprobe hwmon
# Check file permissions for metric collection
ls -la /proc/stat /proc/meminfo /proc/loadavg
Best Practices
-
Resource-Aware Monitoring: Keep monitoring overhead under 5% of system resources
# Monitor the monitors ps aux | grep -E "(netdata|prometheus|node_exporter)"
-
Alert Fatigue Prevention: Set reasonable thresholds to avoid too many alerts
- CPU: Warning at 75%, Critical at 90%
- Memory: Warning at 80%, Critical at 95%
- Disk: Warning at 80%, Critical at 90%
-
Regular Maintenance:
- Review and clean old log files weekly
- Update monitoring tools monthly
- Test alert systems quarterly
Verification
To verify your monitoring setup is working:
# Check all monitoring services
rc-service netdata status
rc-service prometheus-node-exporter status
# Test metric collection
curl -s http://localhost:9100/metrics | wc -l
curl -s http://localhost:19999/api/v1/info
# Verify log collection
tail -f /var/log/system-health.log
Expected Output:
* status: started
* status: started
2847
{"version":"v1.35.1","uid":"12345678-1234-1234-1234-123456789012"}
Wrapping Up
You just learned how to:
- Set up comprehensive performance monitoring on Alpine Linux
- Configure real-time dashboards with minimal resource overhead
- Create custom monitoring scripts and alerts
- Troubleshoot common monitoring issues
That’s it! You now have a robust monitoring setup that gives you complete visibility into your Alpine Linux system without bloating it. This setup scales from tiny containers to production servers and I’ve used variations of it across hundreds of systems. The key is starting simple and adding complexity only when you need it.