๐ AlmaLinux Monitoring: Complete Prometheus & Grafana Guide for Real-Time Insights
Hey there, monitoring maestro! ๐ Ready to transform your AlmaLinux system into an all-seeing eye that catches problems before they become disasters? Today weโre building a complete monitoring stack with Prometheus and Grafana that will give you superpowers to see everything happening in your infrastructure! ๐
Whether youโre monitoring a single server or an entire fleet, this guide will turn your AlmaLinux system into a monitoring powerhouse that provides real-time insights and beautiful visualizations! ๐ช
๐ค Why is Monitoring Important?
Imagine driving a car with no dashboard โ no speedometer, no fuel gauge, no warning lights! ๐ฑ Thatโs what running servers without monitoring is like. Youโre flying blind until something crashes!
Hereโs why Prometheus & Grafana on AlmaLinux is absolutely essential:
- ๐ Real-Time Insights - See whatโs happening right now, not yesterday
- ๐จ Proactive Alerting - Fix problems before users notice them
- ๐ Beautiful Dashboards - Visualize complex data at a glance
- ๐ Historical Analysis - Understand trends and patterns over time
- โก Performance Optimization - Identify bottlenecks and inefficiencies
- ๐พ Capacity Planning - Know when youโll need more resources
- ๐ก๏ธ Security Monitoring - Detect suspicious activities immediately
- ๐ฑ 24/7 Awareness - Get alerts on your phone when things go wrong
๐ฏ What You Need
Before we start building your monitoring empire, letโs make sure you have everything ready:
โ AlmaLinux 9.x system (with 2+ GB RAM) โ Root or sudo access for installation โ Internet connection for downloading packages โ Basic Linux knowledge (files, services, networking) โ Systems to monitor (weโll start with localhost) โ Web browser for accessing dashboards โ Coffee ready โ (this is going to be fun!) โ Excitement about data and visualizations! ๐
๐ Step 1: Install Prometheus Server
Letโs start by setting up Prometheus to collect all those juicy metrics! ๐ฏ
# Create prometheus user
sudo useradd --no-create-home --shell /bin/false prometheus
# Create directories for Prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
# Download Prometheus (check for latest version at prometheus.io)
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
# Extract the archive
tar xvf prometheus-2.47.0.linux-amd64.tar.gz
cd prometheus-2.47.0.linux-amd64/
# Copy binaries to system locations
sudo cp prometheus /usr/local/bin/
sudo cp promtool /usr/local/bin/
# Copy configuration files
sudo cp -r consoles/ /etc/prometheus/
sudo cp -r console_libraries/ /etc/prometheus/
# Set ownership
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
# Verify installation
prometheus --version
Expected output:
prometheus, version 2.47.0 (branch: HEAD, revision: xxx)
build user: root@xxx
build date: 20XX-XX-XX
go version: go1.20.X
Great! Prometheus is installed! ๐
๐ง Step 2: Configure Prometheus
Now letโs configure Prometheus to start collecting metrics:
# Create Prometheus configuration file
sudo tee /etc/prometheus/prometheus.yml << 'EOF'
# Global configuration
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate rules
scrape_timeout: 10s # Timeout for scraping
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them
rule_files:
- "/etc/prometheus/rules/*.yml"
# Scrape configurations
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
labels:
instance: 'prometheus-server'
# Scrape node exporter for system metrics
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
labels:
instance: 'almalinux-host'
# Scrape application metrics
- job_name: 'applications'
static_configs:
- targets: ['localhost:8080']
labels:
app: 'my-app'
env: 'production'
# Scrape Docker containers (if Docker is installed)
- job_name: 'docker'
static_configs:
- targets: ['localhost:9323']
labels:
instance: 'docker-host'
EOF
# Create systemd service file for Prometheus
sudo tee /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description=Prometheus Monitoring System
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--storage.tsdb.retention.time=30d \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.enable-lifecycle
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# Reload systemd and start Prometheus
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
# Check status
sudo systemctl status prometheus
Perfect! Prometheus is running on port 9090! ๐
๐ Step 3: Install Node Exporter for System Metrics
Letโs add Node Exporter to collect detailed system metrics:
# Download Node Exporter
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
# Extract and install
tar xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter
# Create systemd service for Node Exporter
sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.filesystem.mount-points-exclude="^/(dev|proc|sys|run)($|/)" \
--collector.netclass.ignored-devices="^(veth.*|br.*|docker.*|virbr.*)" \
--collector.systemd \
--collector.processes
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# Start Node Exporter
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
# Verify it's working
curl http://localhost:9100/metrics | grep "node_"
Excellent! Node Exporter is collecting system metrics! ๐
โ Step 4: Install and Configure Grafana
Now for the fun part โ beautiful dashboards with Grafana!
# Add Grafana repository
sudo tee /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
EOF
# Install Grafana
sudo dnf install -y grafana
# Start and enable Grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
# Check status
sudo systemctl status grafana-server
# Configure firewall
sudo firewall-cmd --permanent --add-port=3000/tcp
sudo firewall-cmd --permanent --add-port=9090/tcp
sudo firewall-cmd --permanent --add-port=9100/tcp
sudo firewall-cmd --reload
Access Grafana at http://your-server:3000
(default login: admin/admin)
Now letโs create some awesome dashboards:
# Create dashboard configuration
cat > /tmp/system-dashboard.json << 'EOF'
{
"dashboard": {
"title": "System Monitoring Dashboard",
"panels": [
{
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
"title": "CPU Usage",
"targets": [
{
"expr": "100 - (avg(irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"legendFormat": "CPU Usage %"
}
],
"type": "graph"
},
{
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
"title": "Memory Usage",
"targets": [
{
"expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
"legendFormat": "Memory Usage %"
}
],
"type": "graph"
},
{
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 8},
"title": "Disk I/O",
"targets": [
{
"expr": "rate(node_disk_io_time_seconds_total[5m]) * 100",
"legendFormat": "{{device}}"
}
],
"type": "graph"
},
{
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8},
"title": "Network Traffic",
"targets": [
{
"expr": "rate(node_network_receive_bytes_total[5m])",
"legendFormat": "RX {{device}}"
},
{
"expr": "rate(node_network_transmit_bytes_total[5m])",
"legendFormat": "TX {{device}}"
}
],
"type": "graph"
}
]
}
}
EOF
# Import dashboard via API (after setting up Grafana)
# You'll need to get an API key from Grafana UI first
Fantastic! Your dashboards are ready! ๐ฏ
๐ฎ Quick Examples
Example 1: Custom Application Metrics
# Create a Python app with Prometheus metrics
cat > app_with_metrics.py << 'EOF'
#!/usr/bin/env python3
from prometheus_client import start_http_server, Counter, Histogram, Gauge
import time
import random
# Define metrics
request_count = Counter('app_requests_total', 'Total requests', ['method', 'endpoint'])
request_duration = Histogram('app_request_duration_seconds', 'Request duration', ['method', 'endpoint'])
active_users = Gauge('app_active_users', 'Number of active users')
error_count = Counter('app_errors_total', 'Total errors', ['type'])
def process_request():
"""Simulate processing a request"""
method = random.choice(['GET', 'POST', 'PUT', 'DELETE'])
endpoint = random.choice(['/api/users', '/api/products', '/api/orders'])
# Track request
request_count.labels(method=method, endpoint=endpoint).inc()
# Simulate processing time
with request_duration.labels(method=method, endpoint=endpoint).time():
time.sleep(random.uniform(0.1, 1.0))
# Randomly generate errors
if random.random() < 0.1:
error_count.labels(type='server_error').inc()
# Update active users
active_users.set(random.randint(10, 100))
if __name__ == '__main__':
# Start Prometheus metrics server
start_http_server(8000)
print("๐ฏ Metrics server started on port 8000")
# Continuously generate metrics
while True:
process_request()
time.sleep(1)
EOF
# Install Python Prometheus client
pip3 install prometheus-client
# Run the application
python3 app_with_metrics.py &
This exposes custom application metrics! ๐
Example 2: Alert Rules Configuration
# Create alert rules
sudo mkdir -p /etc/prometheus/rules
sudo tee /etc/prometheus/rules/alerts.yml << 'EOF'
groups:
- name: system_alerts
interval: 30s
rules:
- alert: HighCPUUsage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% (current value: {{ $value }}%)"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 90% (current value: {{ $value }}%)"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space"
description: "Less than 10% disk space remaining on root partition"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service is down"
description: "{{ $labels.job }} on {{ $labels.instance }} is down"
- alert: HighNetworkTraffic
expr: rate(node_network_receive_bytes_total[5m]) + rate(node_network_transmit_bytes_total[5m]) > 100000000
for: 5m
labels:
severity: warning
annotations:
summary: "High network traffic"
description: "Network traffic exceeds 100MB/s"
EOF
# Reload Prometheus configuration
sudo systemctl reload prometheus
This sets up comprehensive alerting! ๐จ
Example 3: Advanced Grafana Dashboard
# Create advanced monitoring script
cat > setup_advanced_dashboard.sh << 'EOF'
#!/bin/bash
# Setup advanced Grafana dashboard
# Set Grafana API credentials
GRAFANA_URL="http://localhost:3000"
GRAFANA_USER="admin"
GRAFANA_PASS="admin"
# Create data source for Prometheus
curl -X POST \
-H "Content-Type: application/json" \
-u "${GRAFANA_USER}:${GRAFANA_PASS}" \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://localhost:9090",
"access": "proxy",
"isDefault": true
}' \
"${GRAFANA_URL}/api/datasources"
# Create folder for dashboards
curl -X POST \
-H "Content-Type: application/json" \
-u "${GRAFANA_USER}:${GRAFANA_PASS}" \
-d '{
"title": "System Monitoring"
}' \
"${GRAFANA_URL}/api/folders"
# Import comprehensive dashboard
cat > comprehensive_dashboard.json << 'JSON'
{
"dashboard": {
"title": "Comprehensive System Monitoring",
"tags": ["system", "monitoring"],
"timezone": "browser",
"panels": [
{
"title": "System Overview",
"type": "stat",
"targets": [
{"expr": "up", "legendFormat": "Services Up"}
]
},
{
"title": "CPU by Core",
"type": "graph",
"targets": [
{"expr": "irate(node_cpu_seconds_total[5m])", "legendFormat": "CPU {{cpu}} - {{mode}}"}
]
},
{
"title": "Memory Breakdown",
"type": "piechart",
"targets": [
{"expr": "node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes", "legendFormat": "Used"},
{"expr": "node_memory_MemAvailable_bytes", "legendFormat": "Available"}
]
},
{
"title": "Disk Usage by Mount",
"type": "bargauge",
"targets": [
{"expr": "(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100", "legendFormat": "{{mountpoint}}"}
]
}
]
},
"overwrite": true
}
JSON
# Import the dashboard
curl -X POST \
-H "Content-Type: application/json" \
-u "${GRAFANA_USER}:${GRAFANA_PASS}" \
-d @comprehensive_dashboard.json \
"${GRAFANA_URL}/api/dashboards/db"
echo "โ
Advanced dashboard created successfully!"
EOF
chmod +x setup_advanced_dashboard.sh
./setup_advanced_dashboard.sh
This creates professional-grade dashboards! ๐
๐จ Fix Common Problems
Problem 1: Prometheus Canโt Scrape Targets
Symptoms: Targets show as โDOWNโ in Prometheus
# Check target status
curl http://localhost:9090/targets
# Debug connectivity
curl http://localhost:9100/metrics # Node exporter
curl http://localhost:3000/metrics # Grafana
# Check firewall rules
sudo firewall-cmd --list-all
# Fix SELinux if needed
sudo setenforce 0 # Temporary
# Or create proper SELinux policy
# Check Prometheus logs
sudo journalctl -u prometheus -n 50 --no-pager
# Verify configuration
promtool check config /etc/prometheus/prometheus.yml
Problem 2: High Memory Usage by Prometheus
Symptoms: Prometheus consuming too much RAM
# Check current memory usage
ps aux | grep prometheus
# Optimize Prometheus configuration
sudo tee -a /etc/systemd/system/prometheus.service << 'EOF'
Environment="GOMAXPROCS=2"
Environment="GOMEMLIMIT=1GiB"
EOF
# Reduce retention time
sudo sed -i 's/retention.time=30d/retention.time=7d/' /etc/systemd/system/prometheus.service
# Clean up old data
sudo systemctl stop prometheus
sudo rm -rf /var/lib/prometheus/data/*
sudo systemctl start prometheus
# Enable compression
echo "storage.tsdb.compression: snappy" >> /etc/prometheus/prometheus.yml
Problem 3: Grafana Dashboards Not Loading
Symptoms: Dashboards show โNo Dataโ or errors
# Check Grafana data source configuration
curl -u admin:admin http://localhost:3000/api/datasources
# Test Prometheus connectivity from Grafana
curl -X POST \
-H "Content-Type: application/json" \
-u admin:admin \
-d '{"datasourceId": 1}' \
http://localhost:3000/api/datasources/proxy/1/api/v1/query?query=up
# Check time synchronization
timedatectl status
sudo systemctl restart chronyd
# Restart services
sudo systemctl restart prometheus
sudo systemctl restart grafana-server
# Check logs
sudo journalctl -u grafana-server -n 50 --no-pager
Problem 4: Alerts Not Firing
Symptoms: Alert conditions met but no notifications
# Check alert rules syntax
promtool check rules /etc/prometheus/rules/alerts.yml
# Verify alerts are loaded
curl http://localhost:9090/api/v1/rules | jq .
# Check alert status
curl http://localhost:9090/api/v1/alerts | jq .
# Test alert manager (if configured)
amtool alert add alertname=test severity=critical
# Force reload rules
sudo kill -HUP $(pidof prometheus)
# Check Prometheus logs for rule evaluation
sudo journalctl -u prometheus | grep -i "rule\|alert"
๐ Simple Commands Summary
Command | Purpose |
---|---|
sudo systemctl status prometheus | Check Prometheus status |
curl http://localhost:9090/metrics | View Prometheus metrics |
curl http://localhost:9100/metrics | View Node Exporter metrics |
promtool check config /etc/prometheus/prometheus.yml | Validate Prometheus config |
sudo systemctl restart grafana-server | Restart Grafana |
curl -u admin:admin http://localhost:3000/api/datasources | Check Grafana data sources |
sudo journalctl -u prometheus -f | View Prometheus logs |
curl http://localhost:9090/targets | Check scrape targets |
promtool query instant http://localhost:9090 'up' | Query Prometheus |
sudo firewall-cmd --add-port=9090/tcp --permanent | Open Prometheus port |
๐ก Tips for Success
๐ฏ Start Small: Begin with basic metrics before adding complexity
๐ Label Everything: Use descriptive labels for better organization
๐ Dashboard Design: Keep dashboards focused and avoid clutter
๐ก๏ธ Security First: Use authentication and HTTPS in production
๐ Performance Tune: Adjust scrape intervals based on your needs
๐ Document Queries: Save important PromQL queries for reuse
๐ Regular Maintenance: Clean up old data and optimize storage
โก Alert Wisely: Avoid alert fatigue with well-tuned thresholds
๐ What You Learned
Congratulations! Youโve successfully mastered monitoring with Prometheus and Grafana on AlmaLinux! ๐
โ Installed Prometheus for metrics collection โ Configured Node Exporter for system metrics โ Set up Grafana for beautiful visualizations โ Created custom dashboards for your specific needs โ Implemented alerting rules for proactive monitoring โ Added application metrics with custom exporters โ Troubleshot common issues and optimized performance โ Built comprehensive monitoring infrastructure
๐ฏ Why This Matters
Monitoring is the foundation of reliable infrastructure! ๐ With your AlmaLinux monitoring stack, you now have:
- Complete visibility into system and application performance
- Early warning system for potential problems
- Data-driven insights for optimization decisions
- Professional dashboards that impress stakeholders
- Foundation for SRE practices and reliability engineering
Youโre now equipped to run production systems with confidence, knowing youโll spot issues before they impact users! Your monitoring skills put you in the league of professional DevOps engineers and SREs! ๐
Keep monitoring, keep improving, and remember โ what gets measured gets managed! Youโve got this! โญ๐