cassandra
bash
+
+
argocd
+
c
bash
soap
grpc
sql
+
soap
+
jasmine
^
erlang
+
+
c
+
+
istio
yaml
~
chef
+
terraform
+
+
fastapi
fortran
mxnet
+
+
+
+
+
+
+
grafana
ocaml
+
xgboost
tcl
sublime
fortran
+
+
+
+
+
mocha
mxnet
~
f#
+
+
ada
+
+
+
?
+
echo
+
+
astro
notepad++
debian
+
webpack
+
+
+
lua
sails
+
+
!
+
+
+
https
+
solidity
+
+
Back to Blog
📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide
Alpine Linux Prometheus Grafana

📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

Published Jun 18, 2025

Comprehensive tutorial for system administrators to set up powerful monitoring and visualization with Prometheus and Grafana on Alpine Linux. Perfect for infrastructure monitoring and alerting!

19 min read
0 views
Table of Contents

📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

Let’s build a powerful monitoring and visualization stack with Prometheus and Grafana on Alpine Linux! 🚀 This comprehensive tutorial shows you how to set up complete infrastructure monitoring with metrics collection, alerting, and beautiful dashboards. Perfect for DevOps teams and system administrators! 😊

🤔 What are Prometheus and Grafana?

Prometheus is a powerful monitoring and alerting system that collects metrics from your infrastructure, while Grafana provides stunning visualizations and dashboards for your data!

This monitoring stack is like:

  • 🔍 Smart surveillance systems that watch over your entire infrastructure
  • 📈 Business intelligence dashboards that turn raw data into insights
  • 🚨 Early warning systems that alert you before problems become critical

🎯 What You Need

Before we start, you need:

  • ✅ Alpine Linux system with sufficient resources (4GB+ RAM recommended)
  • ✅ Understanding of system monitoring concepts and metrics
  • ✅ Basic knowledge of networking and service configuration
  • ✅ Root access for system service installation

📋 Step 1: Install and Configure Prometheus

Install Prometheus Server

Let’s install Prometheus, the metrics collection and storage engine! 😊

What we’re doing: Installing Prometheus server for comprehensive metrics collection and monitoring.

# Update package list
apk update

# Install Prometheus server
apk add prometheus

# Install additional monitoring tools
apk add prometheus-node-exporter prometheus-alertmanager

# Check Prometheus version
prometheus --version

# Check installation paths
ls -la /etc/prometheus/
ls -la /var/lib/prometheus/

# Create Prometheus user and directories
adduser -D -s /bin/false prometheus 2>/dev/null || true
mkdir -p /var/lib/prometheus
mkdir -p /etc/prometheus/rules
mkdir -p /etc/prometheus/file_sd
chown -R prometheus:prometheus /var/lib/prometheus /etc/prometheus

# Start Prometheus service
rc-service prometheus start

# Enable Prometheus to start at boot
rc-update add prometheus default

# Test Prometheus web interface
echo "Prometheus should be available at: http://localhost:9090"

What this does: 📖 Installs Prometheus with all necessary components for monitoring.

Example output:

prometheus, version 2.45.0 (branch: HEAD, revision: 8b2f6b4)
  build user:       builduser@buildhost
  build date:       20231124-14:56:23
  go version:       go1.21.5

What this means: Prometheus is installed and ready for configuration! ✅

Configure Prometheus for Production

Let’s create a comprehensive Prometheus configuration! 🎯

What we’re doing: Configuring Prometheus with targets, rules, and optimal settings for production monitoring.

# Backup original Prometheus configuration
cp /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.backup

# Create comprehensive Prometheus configuration
cat > /etc/prometheus/prometheus.yml << 'EOF'
# Prometheus Configuration for Complete Monitoring
global:
  scrape_interval: 15s          # How frequently to scrape targets
  evaluation_interval: 15s      # How frequently to evaluate rules
  external_labels:
    cluster: 'alpine-production'
    environment: 'production'

# Alerting configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

# Rules configuration
rule_files:
  - "/etc/prometheus/rules/*.yml"

# Scrape configurations
scrape_configs:
  # Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
    scrape_interval: 5s
    metrics_path: /metrics

  # Node Exporter for system metrics
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 10s
    metrics_path: /metrics

  # Alpine Linux specific monitoring
  - job_name: 'alpine-system'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 15s
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'node_(cpu|memory|disk|network).*'
        target_label: __name__
        replacement: 'alpine_${1}'

  # Application monitoring
  - job_name: 'application-metrics'
    file_sd_configs:
      - files:
        - '/etc/prometheus/file_sd/applications.yml'
    scrape_interval: 30s

  # Docker container monitoring (if Docker is installed)
  - job_name: 'docker-containers'
    static_configs:
      - targets: ['localhost:9323']
    scrape_interval: 30s
    metrics_path: /metrics

  # Custom service monitoring
  - job_name: 'custom-services'
    file_sd_configs:
      - files:
        - '/etc/prometheus/file_sd/services.yml'
    scrape_interval: 60s

# Storage configuration
storage:
  tsdb:
    path: /var/lib/prometheus/data
    retention.time: 30d
    retention.size: 10GB
    wal-compression: true

# Web configuration
web:
  listen-address: '0.0.0.0:9090'
  max-connections: 512
  read-timeout: 30s
  external-url: 'http://localhost:9090'
  enable-lifecycle: true
  enable-admin-api: true
EOF

# Create application discovery configuration
cat > /etc/prometheus/file_sd/applications.yml << 'EOF'
# Application Service Discovery
- targets:
  - 'localhost:8080'
  - 'localhost:8081'
  labels:
    service: 'web-application'
    environment: 'production'
    team: 'backend'

- targets:
  - 'localhost:3000'
  labels:
    service: 'frontend-application'
    environment: 'production'
    team: 'frontend'
EOF

# Create services discovery configuration
cat > /etc/prometheus/file_sd/services.yml << 'EOF'
# Service Discovery for Custom Services
- targets:
  - 'localhost:6379'
  labels:
    service: 'redis'
    environment: 'production'
    type: 'database'

- targets:
  - 'localhost:11211'
  labels:
    service: 'memcached'
    environment: 'production'
    type: 'cache'
EOF

# Set proper ownership
chown -R prometheus:prometheus /etc/prometheus/

# Validate Prometheus configuration
promtool check config /etc/prometheus/prometheus.yml

# Restart Prometheus with new configuration
rc-service prometheus restart

echo "Prometheus configured for production monitoring! 📊"

What this creates: Production-ready Prometheus configuration with service discovery! ✅

Create Alerting Rules

Let’s set up intelligent alerting rules! 🚨

What we’re doing: Creating comprehensive alerting rules for system health, performance, and availability monitoring.

# Create alerting rules for system monitoring
cat > /etc/prometheus/rules/system-alerts.yml << 'EOF'
# System Alerting Rules for Alpine Linux
groups:
  - name: system.rules
    rules:
      # High CPU usage alert
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"

      # High memory usage alert
      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 5m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is above 85% on {{ $labels.instance }}"

      # Low disk space alert
      - alert: LowDiskSpace
        expr: (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100 > 90
        for: 10m
        labels:
          severity: critical
          service: system
        annotations:
          summary: "Low disk space warning"
          description: "Disk usage is above 90% on {{ $labels.instance }} {{ $labels.mountpoint }}"

      # System load alert
      - alert: HighSystemLoad
        expr: node_load15 > 2
        for: 10m
        labels:
          severity: warning
          service: system
        annotations:
          summary: "High system load detected"
          description: "15-minute load average is {{ $value }} on {{ $labels.instance }}"

      # Instance down alert
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
          service: monitoring
        annotations:
          summary: "Instance is down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute"

  - name: application.rules
    rules:
      # Application response time alert
      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
          service: application
        annotations:
          summary: "High application response time"
          description: "95th percentile response time is {{ $value }}s on {{ $labels.instance }}"

      # Error rate alert
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100 > 5
        for: 5m
        labels:
          severity: critical
          service: application
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }}% on {{ $labels.instance }}"

  - name: infrastructure.rules
    rules:
      # Redis connection alert
      - alert: RedisDown
        expr: up{job="redis"} == 0
        for: 1m
        labels:
          severity: critical
          service: redis
        annotations:
          summary: "Redis is down"
          description: "Redis service is not responding on {{ $labels.instance }}"

      # Network connectivity alert
      - alert: HighNetworkTraffic
        expr: rate(node_network_receive_bytes_total[5m]) > 100000000  # 100MB/s
        for: 10m
        labels:
          severity: warning
          service: network
        annotations:
          summary: "High network traffic detected"
          description: "Network receive traffic is {{ $value | humanize }}B/s on {{ $labels.instance }}"
EOF

# Create recording rules for performance optimization
cat > /etc/prometheus/rules/recording-rules.yml << 'EOF'
# Recording Rules for Performance Optimization
groups:
  - name: performance.rules
    interval: 30s
    rules:
      # CPU usage recording rule
      - record: instance:cpu_utilization:rate5m
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
        labels:
          metric_type: performance

      # Memory usage recording rule
      - record: instance:memory_utilization:ratio
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes
        labels:
          metric_type: performance

      # Disk I/O recording rule
      - record: instance:disk_io:rate5m
        expr: rate(node_disk_io_time_seconds_total[5m])
        labels:
          metric_type: performance

      # Network traffic recording rule
      - record: instance:network_traffic:rate5m
        expr: rate(node_network_receive_bytes_total[5m]) + rate(node_network_transmit_bytes_total[5m])
        labels:
          metric_type: performance

  - name: application.recording
    interval: 60s
    rules:
      # Request rate recording rule
      - record: application:request_rate:rate5m
        expr: rate(http_requests_total[5m])
        labels:
          metric_type: application

      # Error rate recording rule
      - record: application:error_rate:rate5m
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
        labels:
          metric_type: application
EOF

# Validate alerting rules
promtool check rules /etc/prometheus/rules/*.yml

# Set proper ownership for rules
chown -R prometheus:prometheus /etc/prometheus/rules/

# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

echo "Prometheus alerting rules configured! 🚨"

What this creates: Comprehensive alerting system for proactive monitoring! 🌟

🛠️ Step 2: Install Node Exporter

Configure Node Exporter for System Metrics

Let’s set up Node Exporter to collect detailed system metrics! 😊

What we’re doing: Installing and configuring Node Exporter for comprehensive system monitoring.

# Start Node Exporter service
rc-service prometheus-node-exporter start

# Enable Node Exporter to start at boot
rc-update add prometheus-node-exporter default

# Create Node Exporter configuration
cat > /etc/conf.d/prometheus-node-exporter << 'EOF'
# Node Exporter Configuration
NODE_EXPORTER_OPTS="--web.listen-address=0.0.0.0:9100 \
                    --path.procfs=/proc \
                    --path.sysfs=/sys \
                    --path.rootfs=/ \
                    --collector.filesystem.ignored-mount-points='^/(dev|proc|sys|var/lib/docker/.+)($|/)' \
                    --collector.filesystem.ignored-fs-types='^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$' \
                    --collector.textfile.directory=/var/lib/node_exporter/textfile_collector \
                    --collector.cpu \
                    --collector.diskstats \
                    --collector.filesystem \
                    --collector.loadavg \
                    --collector.meminfo \
                    --collector.netdev \
                    --collector.netstat \
                    --collector.stat \
                    --collector.time \
                    --collector.uname \
                    --collector.vmstat"
EOF

# Create textfile collector directory
mkdir -p /var/lib/node_exporter/textfile_collector
chown prometheus:prometheus /var/lib/node_exporter/textfile_collector

# Create custom metrics collection script
cat > /usr/local/bin/custom-metrics-collector.sh << 'EOF'
#!/bin/sh
# Custom Metrics Collector for Alpine Linux

TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"
TEMP_FILE=$(mktemp)

# Collect Alpine package information
echo "# HELP alpine_packages_total Total number of installed packages" >> $TEMP_FILE
echo "# TYPE alpine_packages_total gauge" >> $TEMP_FILE
PACKAGE_COUNT=$(apk info | wc -l)
echo "alpine_packages_total $PACKAGE_COUNT" >> $TEMP_FILE

# Collect service status
echo "# HELP alpine_service_status Service status (1=running, 0=stopped)" >> $TEMP_FILE
echo "# TYPE alpine_service_status gauge" >> $TEMP_FILE

SERVICES="sshd chronyd syslog prometheus"
for service in $SERVICES; do
    if rc-service $service status >/dev/null 2>&1; then
        echo "alpine_service_status{service=\"$service\"} 1" >> $TEMP_FILE
    else
        echo "alpine_service_status{service=\"$service\"} 0" >> $TEMP_FILE
    fi
done

# Collect system uptime in seconds
echo "# HELP alpine_uptime_seconds System uptime in seconds" >> $TEMP_FILE
echo "# TYPE alpine_uptime_seconds gauge" >> $TEMP_FILE
UPTIME=$(awk '{print $1}' /proc/uptime)
echo "alpine_uptime_seconds $UPTIME" >> $TEMP_FILE

# Collect temperature if available
if [ -r /sys/class/thermal/thermal_zone0/temp ]; then
    echo "# HELP alpine_temperature_celsius CPU temperature in Celsius" >> $TEMP_FILE
    echo "# TYPE alpine_temperature_celsius gauge" >> $TEMP_FILE
    TEMP=$(cat /sys/class/thermal/thermal_zone0/temp)
    TEMP_C=$(echo "$TEMP / 1000" | bc -l)
    echo "alpine_temperature_celsius $TEMP_C" >> $TEMP_FILE
fi

# Atomically move the file to the textfile directory
mv $TEMP_FILE $TEXTFILE_DIR/custom_metrics.prom
EOF

chmod +x /usr/local/bin/custom-metrics-collector.sh

# Create cron job for custom metrics
echo "*/1 * * * * /usr/local/bin/custom-metrics-collector.sh" | crontab -u prometheus -

# Restart Node Exporter with new configuration
rc-service prometheus-node-exporter restart

# Test Node Exporter endpoint
echo "Testing Node Exporter metrics..."
curl -s http://localhost:9100/metrics | head -20

echo "Node Exporter configured for system monitoring! 📈"

What this does: Sets up comprehensive system metrics collection with custom Alpine-specific metrics! ✅

🎨 Step 3: Install and Configure Grafana

Install Grafana Visualization Platform

Let’s install Grafana for beautiful data visualization! 🎮

What we’re doing: Installing Grafana for creating stunning monitoring dashboards and visualizations.

# Install Grafana
apk add grafana

# Check Grafana version
grafana-server --version

# Create Grafana configuration directories
mkdir -p /etc/grafana/provisioning/{dashboards,datasources,notifiers}
mkdir -p /var/lib/grafana/{dashboards,plugins}
mkdir -p /var/log/grafana

# Set proper ownership
chown -R grafana:grafana /var/lib/grafana /var/log/grafana /etc/grafana

# Create Grafana configuration
cat > /etc/grafana/grafana.ini << 'EOF'
# Grafana Configuration for Alpine Linux Monitoring

[default]
# Instance name
instance_name = alpine-monitoring

[paths]
# Data directory
data = /var/lib/grafana
# Logs directory
logs = /var/log/grafana
# Plugins directory
plugins = /var/lib/grafana/plugins
# Provisioning directory
provisioning = /etc/grafana/provisioning

[server]
# Server settings
http_addr = 0.0.0.0
http_port = 3000
domain = localhost
root_url = http://localhost:3000/
serve_from_sub_path = false
router_logging = false
enable_gzip = true

[database]
# SQLite configuration for simplicity
type = sqlite3
host = 127.0.0.1:3306
name = grafana
user = root
password =
path = /var/lib/grafana/grafana.db
ssl_mode = disable

[session]
# Session configuration
provider = file
provider_config = sessions
cookie_name = grafana_sess
cookie_secure = false
session_life_time = 86400

[security]
# Security settings
admin_user = admin
admin_password = alpine_monitoring_2025
secret_key = alpine_grafana_secret_key_12345
login_remember_days = 7
cookie_username = grafana_user
cookie_remember_name = grafana_remember
disable_gravatar = true

[users]
# User management
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer
default_theme = dark

[auth.anonymous]
# Anonymous access
enabled = false

[log]
# Logging configuration
mode = file
level = info
format = text

[log.file]
# File logging
log_rotate = true
max_lines = 1000000
max_size_shift = 28
daily_rotate = true
max_days = 7

[alerting]
# Alerting settings
enabled = true
execute_alerts = true

[unified_alerting]
# Unified alerting
enabled = true

[metrics]
# Metrics settings
enabled = true
interval_seconds = 10

[grafana_net]
url = https://grafana.net

[external_image_storage]
provider = local

[plugins]
# Plugin settings
enable_alpha = false
app_tls_skip_verify_insecure = false
EOF

# Start Grafana service
rc-service grafana start

# Enable Grafana to start at boot
rc-update add grafana default

echo "Grafana installed and configured! 🎨"
echo "Access Grafana at: http://localhost:3000"
echo "Default login: admin / alpine_monitoring_2025"

What this does: Installs and configures Grafana with optimal settings for Alpine Linux monitoring! 🌟

Configure Grafana Data Sources

Let’s set up Prometheus as a data source in Grafana! 🔧

What we’re doing: Configuring Grafana to connect to Prometheus and other data sources automatically.

# Create Prometheus data source configuration
cat > /etc/grafana/provisioning/datasources/prometheus.yml << 'EOF'
# Grafana Data Sources Configuration
apiVersion: 1

datasources:
  # Primary Prometheus data source
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: true
    jsonData:
      httpMethod: POST
      queryTimeout: 60s
      timeInterval: 30s
      exemplarTraceIdDestinations:
        - name: traceID
          url: http://localhost:16686/trace/${__value.raw}
    secureJsonData: {}

  # Node Exporter metrics (direct access)
  - name: Node-Exporter
    type: prometheus
    access: proxy
    url: http://localhost:9100
    isDefault: false
    editable: true
    jsonData:
      httpMethod: GET
      queryTimeout: 30s
      timeInterval: 15s

  # TestData for examples and testing
  - name: TestData
    type: testdata
    access: proxy
    isDefault: false
    editable: true
EOF

# Create dashboard provisioning configuration
cat > /etc/grafana/provisioning/dashboards/default.yml << 'EOF'
# Dashboard Provisioning Configuration
apiVersion: 1

providers:
  # System monitoring dashboards
  - name: 'alpine-system'
    orgId: 1
    folder: 'System Monitoring'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards/system

  # Application monitoring dashboards
  - name: 'alpine-apps'
    orgId: 1
    folder: 'Applications'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards/applications

  # Infrastructure monitoring dashboards
  - name: 'alpine-infrastructure'
    orgId: 1
    folder: 'Infrastructure'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards/infrastructure
EOF

# Create dashboard directories
mkdir -p /var/lib/grafana/dashboards/{system,applications,infrastructure}

# Set proper ownership
chown -R grafana:grafana /etc/grafana/provisioning /var/lib/grafana/dashboards

# Restart Grafana to load new configuration
rc-service grafana restart

echo "Grafana data sources configured! 🔗"

What this creates: Automatic data source configuration for seamless monitoring! ✅

Create System Monitoring Dashboard

Let’s create a comprehensive system monitoring dashboard! 📊

What we’re doing: Creating a beautiful and functional dashboard for monitoring Alpine Linux system metrics.

# Create comprehensive system monitoring dashboard
cat > /var/lib/grafana/dashboards/system/alpine-system-overview.json << 'EOF'
{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "panels": [
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "vis": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "expr": "100 - (avg by(instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
          "interval": "",
          "legendFormat": "CPU Usage",
          "refId": "A"
        }
      ],
      "title": "CPU Usage",
      "type": "timeseries"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "vis": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "expr": "(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100",
          "interval": "",
          "legendFormat": "Memory Usage",
          "refId": "A"
        }
      ],
      "title": "Memory Usage",
      "type": "timeseries"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 70
              },
              {
                "color": "red",
                "value": 90
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 0,
        "y": 8
      },
      "id": 3,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "values": false,
          "calcs": [
            "lastNotNull"
          ],
          "fields": ""
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true,
        "text": {}
      },
      "pluginVersion": "8.0.0",
      "targets": [
        {
          "expr": "(node_filesystem_size_bytes{mountpoint=\"/\"} - node_filesystem_free_bytes{mountpoint=\"/\"}) / node_filesystem_size_bytes{mountpoint=\"/\"} * 100",
          "interval": "",
          "legendFormat": "Root Disk Usage",
          "refId": "A"
        }
      ],
      "title": "Disk Usage",
      "type": "gauge"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 1
              },
              {
                "color": "red",
                "value": 2
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 6,
        "y": 8
      },
      "id": 4,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "values": false,
          "calcs": [
            "lastNotNull"
          ],
          "fields": ""
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true,
        "text": {}
      },
      "pluginVersion": "8.0.0",
      "targets": [
        {
          "expr": "node_load15",
          "interval": "",
          "legendFormat": "Load Average",
          "refId": "A"
        }
      ],
      "title": "System Load",
      "type": "gauge"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "vis": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "binBps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 8
      },
      "id": 5,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "expr": "rate(node_network_receive_bytes_total[5m])",
          "interval": "",
          "legendFormat": "{{device}} - Receive",
          "refId": "A"
        },
        {
          "expr": "rate(node_network_transmit_bytes_total[5m])",
          "interval": "",
          "legendFormat": "{{device}} - Transmit",
          "refId": "B"
        }
      ],
      "title": "Network Traffic",
      "type": "timeseries"
    }
  ],
  "schemaVersion": 30,
  "style": "dark",
  "tags": ["alpine", "system", "monitoring"],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Alpine Linux System Overview",
  "uid": "alpine-system-overview",
  "version": 1
}
EOF

# Set proper ownership
chown -R grafana:grafana /var/lib/grafana/dashboards/

# Restart Grafana to load dashboards
rc-service grafana restart

echo "System monitoring dashboard created! 📊"

What this creates: Beautiful system monitoring dashboard with key Alpine Linux metrics! 🌟

📊 Quick Monitoring Commands Table

CommandPurposeResult
🔧 promtool query instant 'up'Check target status✅ Service availability
🔍 curl localhost:9090/api/v1/targetsView Prometheus targets✅ Monitoring endpoints
🚀 grafana-cli admin reset-admin-password adminReset Grafana password✅ Access recovery
📋 curl localhost:9100/metrics | grep cpuView Node Exporter CPU metrics✅ System metrics

🎮 Practice Time!

Let’s practice what you learned! Try these monitoring scenarios:

Example 1: Application Performance Monitoring 🟢

What we’re doing: Setting up comprehensive application performance monitoring with custom metrics and alerting.

# Create application monitoring setup
mkdir -p /opt/app-monitoring
cd /opt/app-monitoring

# Create sample application with metrics endpoint
cat > app-metrics-server.py << 'EOF'
#!/usr/bin/env python3
"""
Sample Application with Prometheus Metrics
"""
import time
import random
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.parse import urlparse

class MetricsHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        path = urlparse(self.path).path
        
        if path == '/metrics':
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            
            # Generate sample metrics
            cpu_usage = random.uniform(10, 90)
            memory_usage = random.uniform(30, 80)
            request_count = random.randint(100, 1000)
            response_time = random.uniform(0.1, 2.0)
            
            metrics = f"""# HELP app_cpu_usage_percent Application CPU usage
# TYPE app_cpu_usage_percent gauge
app_cpu_usage_percent {cpu_usage:.2f}

# HELP app_memory_usage_percent Application memory usage
# TYPE app_memory_usage_percent gauge
app_memory_usage_percent {memory_usage:.2f}

# HELP app_requests_total Total application requests
# TYPE app_requests_total counter
app_requests_total {request_count}

# HELP app_response_time_seconds Application response time
# TYPE app_response_time_seconds histogram
app_response_time_seconds_bucket{{le="0.1"}} {random.randint(10, 50)}
app_response_time_seconds_bucket{{le="0.5"}} {random.randint(50, 150)}
app_response_time_seconds_bucket{{le="1.0"}} {random.randint(150, 300)}
app_response_time_seconds_bucket{{le="2.0"}} {random.randint(300, 500)}
app_response_time_seconds_bucket{{le="+Inf"}} {random.randint(500, 600)}
app_response_time_seconds_sum {response_time * request_count:.2f}
app_response_time_seconds_count {request_count}

# HELP app_errors_total Total application errors
# TYPE app_errors_total counter
app_errors_total {random.randint(0, 50)}

# HELP app_uptime_seconds Application uptime
# TYPE app_uptime_seconds gauge
app_uptime_seconds {time.time()}
"""
            self.wfile.write(metrics.encode())
            
        elif path == '/health':
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(b'OK')
            
        else:
            self.send_response(404)
            self.end_headers()
            self.wfile.write(b'Not Found')

if __name__ == '__main__':
    server = HTTPServer(('localhost', 8080), MetricsHandler)
    print("Application metrics server running on http://localhost:8080/metrics")
    server.serve_forever()
EOF

# Install Python if not available
apk add python3

# Make the script executable
chmod +x app-metrics-server.py

# Start the application in background
python3 app-metrics-server.py &
APP_PID=$!

# Update Prometheus to scrape this application
cat >> /etc/prometheus/prometheus.yml << 'EOF'

  # Sample application monitoring
  - job_name: 'sample-application'
    static_configs:
      - targets: ['localhost:8080']
    scrape_interval: 5s
    metrics_path: /metrics
EOF

# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

# Create application dashboard
cat > /var/lib/grafana/dashboards/applications/application-performance.json << 'EOF'
{
  "dashboard": {
    "title": "Application Performance Monitoring",
    "panels": [
      {
        "title": "Application CPU Usage",
        "type": "stat",
        "targets": [
          {
            "expr": "app_cpu_usage_percent",
            "legendFormat": "CPU Usage %"
          }
        ]
      },
      {
        "title": "Application Memory Usage",
        "type": "stat",
        "targets": [
          {
            "expr": "app_memory_usage_percent",
            "legendFormat": "Memory Usage %"
          }
        ]
      },
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(app_requests_total[5m])",
            "legendFormat": "Requests/sec"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(app_response_time_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      }
    ]
  }
}
EOF

echo "Application performance monitoring configured! 🎯"
echo "Check metrics at: http://localhost:8080/metrics"
echo "Application PID: $APP_PID (kill with: kill $APP_PID)"

What this does: Shows you how to monitor application performance with custom metrics! 🎯

Example 2: Infrastructure Alerting System 🟡

What we’re doing: Creating a comprehensive alerting system with multiple notification channels.

# Create advanced alerting configuration
mkdir -p /opt/alerting-system
cd /opt/alerting-system

# Install Alertmanager
apk add prometheus-alertmanager

# Create Alertmanager configuration
cat > /etc/prometheus/alertmanager.yml << 'EOF'
# Alertmanager Configuration for Infrastructure Monitoring
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: '[email protected]'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'password'

templates:
  - '/etc/prometheus/templates/*.tmpl'

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default'
  routes:
    # Critical alerts go to immediate notification
    - match:
        severity: critical
      receiver: 'critical-alerts'
      group_wait: 5s
      repeat_interval: 5m
    
    # Warning alerts go to standard notification
    - match:
        severity: warning
      receiver: 'warning-alerts'
      repeat_interval: 30m
    
    # System alerts
    - match:
        service: system
      receiver: 'system-alerts'

receivers:
  - name: 'default'
    webhook_configs:
      - url: 'http://localhost:9093/webhook'
        send_resolved: true

  - name: 'critical-alerts'
    email_configs:
      - to: '[email protected]'
        subject: '🚨 CRITICAL: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Severity: {{ .Labels.severity }}
          Instance: {{ .Labels.instance }}
          Time: {{ .StartsAt }}
          {{ end }}
    webhook_configs:
      - url: 'http://localhost:9093/webhook/critical'
        send_resolved: true

  - name: 'warning-alerts'
    email_configs:
      - to: '[email protected]'
        subject: '⚠️ WARNING: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          Instance: {{ .Labels.instance }}
          {{ end }}

  - name: 'system-alerts'
    webhook_configs:
      - url: 'http://localhost:9093/webhook/system'
        send_resolved: true

inhibit_rules:
  # Inhibit warning alerts if critical alerts are firing
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']
EOF

# Create alert notification webhook receiver
cat > alert-webhook-receiver.py << 'EOF'
#!/usr/bin/env python3
"""
Alert Webhook Receiver for Custom Notifications
"""
import json
import time
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.parse import urlparse

class AlertWebhookHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        path = urlparse(self.path).path
        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        
        try:
            alert_data = json.loads(post_data.decode('utf-8'))
            self.process_alert(alert_data, path)
            
            self.send_response(200)
            self.send_header('Content-type', 'application/json')
            self.end_headers()
            self.wfile.write(b'{"status": "ok"}')
            
        except Exception as e:
            print(f"Error processing alert: {e}")
            self.send_response(500)
            self.end_headers()
    
    def process_alert(self, alert_data, path):
        timestamp = time.strftime('%Y-%m-%d %H:%M:%S')
        
        print(f"\n{'='*50}")
        print(f"ALERT RECEIVED - {timestamp}")
        print(f"Webhook Path: {path}")
        print(f"{'='*50}")
        
        for alert in alert_data.get('alerts', []):
            status = alert.get('status', 'unknown')
            labels = alert.get('labels', {})
            annotations = alert.get('annotations', {})
            
            print(f"Status: {status}")
            print(f"Alert: {labels.get('alertname', 'Unknown')}")
            print(f"Severity: {labels.get('severity', 'Unknown')}")
            print(f"Instance: {labels.get('instance', 'Unknown')}")
            print(f"Summary: {annotations.get('summary', 'No summary')}")
            print(f"Description: {annotations.get('description', 'No description')}")
            
            if status == 'firing':
                print("🚨 ALERT IS FIRING!")
                self.log_to_file(alert, 'FIRING')
            elif status == 'resolved':
                print("✅ ALERT RESOLVED")
                self.log_to_file(alert, 'RESOLVED')
            
            print("-" * 30)
    
    def log_to_file(self, alert, status):
        timestamp = time.strftime('%Y-%m-%d %H:%M:%S')
        labels = alert.get('labels', {})
        annotations = alert.get('annotations', {})
        
        log_entry = {
            'timestamp': timestamp,
            'status': status,
            'alertname': labels.get('alertname'),
            'severity': labels.get('severity'),
            'instance': labels.get('instance'),
            'summary': annotations.get('summary'),
            'description': annotations.get('description')
        }
        
        with open('/var/log/alerts.log', 'a') as f:
            f.write(json.dumps(log_entry) + '\n')

if __name__ == '__main__':
    server = HTTPServer(('localhost', 9093), AlertWebhookHandler)
    print("Alert webhook receiver running on http://localhost:9093/webhook")
    print("Logs will be written to /var/log/alerts.log")
    server.serve_forever()
EOF

chmod +x alert-webhook-receiver.py

# Start Alertmanager
rc-service prometheus-alertmanager start
rc-update add prometheus-alertmanager default

# Start alert webhook receiver in background
python3 alert-webhook-receiver.py &
WEBHOOK_PID=$!

# Create alert testing script
cat > test-alerts.sh << 'EOF'
#!/bin/sh
echo "🧪 Testing Alert System"

# Test firing an alert
echo "Sending test alert..."
curl -X POST http://localhost:9093/api/v1/alerts \
  -H "Content-Type: application/json" \
  -d '[
    {
      "labels": {
        "alertname": "TestAlert",
        "severity": "warning",
        "instance": "localhost:9090",
        "service": "test"
      },
      "annotations": {
        "summary": "Test alert for monitoring system",
        "description": "This is a test alert to verify the alerting system is working correctly"
      },
      "startsAt": "'$(date -Iseconds)'"
    }
  ]'

echo "Alert sent! Check webhook receiver output and /var/log/alerts.log"
EOF

chmod +x test-alerts.sh

echo "Advanced alerting system configured! 🚨"
echo "Webhook receiver PID: $WEBHOOK_PID (kill with: kill $WEBHOOK_PID)"
echo "Test alerts with: ./test-alerts.sh"

What this does: Demonstrates comprehensive alerting with custom notification handling! 🚨

🚨 Fix Common Problems

Problem 1: Prometheus targets down ❌

What happened: Prometheus cannot scrape metrics from targets. How to fix it: Check network connectivity and service configuration.

# Check Prometheus targets status
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health, lastError: .lastError}'

# Check service status
rc-service prometheus status
rc-service prometheus-node-exporter status

# Check network connectivity
netstat -tulpn | grep -E "(9090|9100)"

# Restart services if needed
rc-service prometheus restart
rc-service prometheus-node-exporter restart

Problem 2: Grafana dashboard not loading data ❌

What happened: Grafana cannot connect to Prometheus or display metrics. How to fix it: Verify data source configuration and queries.

# Test Prometheus connection from Grafana host
curl -s http://localhost:9090/api/v1/query?query=up

# Check Grafana logs
tail -f /var/log/grafana/grafana.log

# Restart Grafana service
rc-service grafana restart

# Test data source connectivity in Grafana UI
echo "Visit http://localhost:3000/datasources and test connections"

Don’t worry! Monitoring systems require fine-tuning - check connectivity and configurations systematically! 💪

💡 Simple Tips

  1. Start with basic metrics 📅 - Begin with CPU, memory, disk, and network monitoring
  2. Set meaningful alert thresholds 🌱 - Avoid alert fatigue with appropriate limits
  3. Create actionable dashboards 🤝 - Focus on metrics that help with decision making
  4. Regular maintenance 💪 - Monitor data retention and clean up old metrics

✅ Check Everything Works

Let’s verify your monitoring stack is working perfectly:

# Complete monitoring system verification
cat > /usr/local/bin/monitoring-stack-check.sh << 'EOF'
#!/bin/sh
echo "=== Monitoring Stack System Check ==="

echo "1. Prometheus Server:"
if curl -s http://localhost:9090/api/v1/query?query=up >/dev/null; then
    echo "✅ Prometheus is running and responding"
    prometheus_version=$(curl -s http://localhost:9090/api/v1/status/buildinfo | jq -r '.data.version')
    echo "Version: $prometheus_version"
    targets_up=$(curl -s http://localhost:9090/api/v1/query?query=up | jq '.data.result | length')
    echo "Active targets: $targets_up"
else
    echo "❌ Prometheus is not responding"
fi

echo -e "\n2. Node Exporter:"
if curl -s http://localhost:9100/metrics >/dev/null; then
    echo "✅ Node Exporter is running"
    metrics_count=$(curl -s http://localhost:9100/metrics | wc -l)
    echo "Metrics available: $metrics_count"
else
    echo "❌ Node Exporter is not responding"
fi

echo -e "\n3. Grafana:"
if curl -s http://localhost:3000/api/health >/dev/null; then
    echo "✅ Grafana is running"
    grafana_version=$(curl -s http://localhost:3000/api/health | jq -r '.version')
    echo "Version: $grafana_version"
else
    echo "❌ Grafana is not responding"
fi

echo -e "\n4. Alertmanager:"
if curl -s http://localhost:9093/api/v1/status >/dev/null; then
    echo "✅ Alertmanager is running"
    alertmanager_version=$(curl -s http://localhost:9093/api/v1/status | jq -r '.data.versionInfo.version')
    echo "Version: $alertmanager_version"
else
    echo "❌ Alertmanager is not responding"
fi

echo -e "\n5. Data Collection Test:"
echo "Testing metric collection..."

# Test CPU metric
cpu_metric=$(curl -s "http://localhost:9090/api/v1/query?query=100-(avg(irate(node_cpu_seconds_total{mode=\"idle\"}[5m]))*100)" | jq -r '.data.result[0].value[1]')
if [ "$cpu_metric" != "null" ]; then
    echo "✅ CPU metrics: ${cpu_metric}%"
else
    echo "❌ CPU metrics not available"
fi

# Test memory metric
memory_metric=$(curl -s "http://localhost:9090/api/v1/query?query=(node_memory_MemTotal_bytes-node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes*100" | jq -r '.data.result[0].value[1]')
if [ "$memory_metric" != "null" ]; then
    echo "✅ Memory metrics: ${memory_metric}%"
else
    echo "❌ Memory metrics not available"
fi

echo -e "\n6. Alert Rules:"
rules_count=$(curl -s http://localhost:9090/api/v1/rules | jq '.data.groups | length')
echo "Loaded rule groups: $rules_count"

echo -e "\n7. Dashboard Status:"
if [ -f "/var/lib/grafana/dashboards/system/alpine-system-overview.json" ]; then
    echo "✅ System dashboard available"
else
    echo "❌ System dashboard missing"
fi

echo -e "\n8. Access URLs:"
echo "🌐 Prometheus: http://localhost:9090"
echo "🌐 Grafana: http://localhost:3000 (admin/alpine_monitoring_2025)"
echo "🌐 Alertmanager: http://localhost:9093"
echo "🌐 Node Exporter: http://localhost:9100/metrics"

echo -e "\nMonitoring stack operational! ✅"
EOF

chmod +x /usr/local/bin/monitoring-stack-check.sh
/usr/local/bin/monitoring-stack-check.sh

Good output shows:

=== Monitoring Stack System Check ===
1. Prometheus Server:
✅ Prometheus is running and responding
Version: 2.45.0
Active targets: 3

2. Node Exporter:
✅ Node Exporter is running
Metrics available: 1247

3. Grafana:
✅ Grafana is running
Version: 9.5.2

Monitoring stack operational! ✅

🏆 What You Learned

Great job! Now you can:

  • ✅ Install and configure Prometheus for comprehensive metrics collection
  • ✅ Set up Node Exporter for detailed system monitoring
  • ✅ Configure Grafana for beautiful data visualization
  • ✅ Create custom dashboards and monitoring workflows
  • ✅ Implement intelligent alerting with Alertmanager
  • ✅ Set up service discovery and target management
  • ✅ Create application performance monitoring solutions
  • ✅ Build comprehensive infrastructure monitoring stacks
  • ✅ Troubleshoot common monitoring issues and optimize performance

🎯 What’s Next?

Now you can try:

  • 📚 Setting up distributed monitoring with multiple Prometheus instances
  • 🛠️ Implementing custom exporters for specific applications
  • 🤝 Integrating with external alerting systems (Slack, PagerDuty)
  • 🌟 Exploring advanced Grafana features like annotations and variables!

Remember: Effective monitoring is the foundation of reliable systems! You’re now building world-class observability on Alpine Linux! 🎉

Keep monitoring and you’ll master infrastructure observability! 💫