📊 AlmaLinux Monitoring Tools: Complete System Oversight Guide

Want to know exactly what’s happening inside your AlmaLinux system? 🔍 System monitoring is the key to maintaining healthy, performant servers and workstations! This comprehensive guide takes you from basic resource monitoring to professional-grade observability platforms. Whether you’re tracking CPU usage or building complete monitoring dashboards, let’s master the art of system oversight! ⚡

🤔 Why System Monitoring is Essential?

Monitoring transforms reactive firefighting into proactive management! 🌟 Here’s why it’s crucial:

🚨 Early Problem Detection: Spot issues before users notice
📈 Performance Optimization: Identify and fix bottlenecks
💰 Resource Planning: Know when to upgrade hardware
🛡️ Security Monitoring: Detect suspicious activities
📊 Capacity Planning: Predict future resource needs
🔧 Troubleshooting: Quickly diagnose system problems
📝 Compliance: Meet audit and reporting requirements
😌 Peace of Mind: Always know your system’s health

Properly monitored systems have 75% less downtime! 🏆

🎯 What You Need

Let’s prepare for monitoring mastery! ✅

✅ AlmaLinux system with root or sudo access
✅ Basic understanding of system resources
✅ Terminal access for command-line tools
✅ Network connectivity for remote monitoring
✅ 60 minutes to explore all monitoring tools
✅ Some test workloads to monitor
✅ Curiosity about system internals
✅ Excitement to see everything happening! 🎉

Let’s unlock complete system visibility! 🌍

📝 Step 1: Basic Command-Line Monitoring

Master essential monitoring commands! 🎯

System Load and Uptime:

# Check system uptime and load average:
uptime
# Output: 10:23:45 up 5 days, 3:15, 2 users, load average: 0.15, 0.12, 0.09
# Load average = 1-minute, 5-minute, 15-minute averages

# Detailed uptime information:
uptime -p    # Pretty format: up 5 days, 3 hours, 15 minutes
uptime -s    # System start time: 2025-09-12 07:08:30

# Understanding load average:
# Load of 1.0 = 100% CPU usage on single-core system
# Load of 4.0 = 100% CPU usage on quad-core system
# Check CPU count:
nproc        # Number of processing units

# W command - who's logged in and what they're doing:
w
# Shows users, TTY, login time, idle time, JCPU, PCPU, and current command

Memory Usage Monitoring:

# Free memory display:
free -h      # Human-readable format
# Output:
#               total        used        free      shared  buff/cache   available
# Mem:           15Gi       3.2Gi       8.1Gi       245Mi       4.2Gi        11Gi
# Swap:         8.0Gi          0B       8.0Gi

# Continuous memory monitoring:
free -h -s 2  # Update every 2 seconds

# Detailed memory information:
cat /proc/meminfo | head -20

# Memory usage by process:
ps aux --sort=-%mem | head -10

# Show memory statistics:
vmstat 2 5    # Update every 2 seconds, 5 times
# Columns: r=running, b=blocked, swpd=swap used, free=free memory
#          buff=buffers, cache=cache, si=swap in, so=swap out

# Memory pressure information:
cat /proc/pressure/memory

CPU Usage Tracking:

# Real-time CPU usage:
top          # Interactive process viewer
# Key commands in top:
# 1 - Show individual CPU cores
# M - Sort by memory usage
# P - Sort by CPU usage
# k - Kill process
# r - Renice process
# q - Quit

# CPU information:
lscpu        # Detailed CPU architecture information

# Per-core CPU usage:
mpstat -P ALL 2   # All CPUs, update every 2 seconds
# Install if missing: sudo dnf install sysstat

# Process CPU usage:
ps aux --sort=-%cpu | head -10

# CPU frequency monitoring:
watch -n 1 "grep MHz /proc/cpuinfo"

# CPU temperature (if sensors available):
sensors      # Install: sudo dnf install lm_sensors
sudo sensors-detect  # Configure sensors

Disk Usage and I/O:

# Disk space usage:
df -h        # Human-readable disk usage
df -i        # Inode usage

# Directory disk usage:
du -sh /var/*    # Size of directories in /var
du -h --max-depth=1 / | sort -hr  # Sorted by size

# Disk I/O statistics:
iostat -x 2      # Extended stats, update every 2 seconds

# Real-time I/O monitoring:
iotop            # Interactive I/O monitor
# Install: sudo dnf install iotop

# Disk I/O by process:
sudo iotop -o    # Only show processes doing I/O

# Check disk health:
sudo smartctl -a /dev/sda   # Install: sudo dnf install smartmontools

Perfect! 🎉 Basic monitoring commands mastered!

🔧 Step 2: Advanced Monitoring Tools

Deploy professional monitoring utilities! 📦

htop - Enhanced Process Monitor:

# Install htop:
sudo dnf install htop

# Launch htop:
htop

# htop features:
# - Color-coded resource bars
# - Tree view of processes (F5)
# - Search processes (F3)
# - Filter processes (F4)
# - Kill processes (F9)
# - Sort by columns (F6)
# - Setup/customize (F2)

# htop configuration (~/.config/htop/htoprc):
# Customize colors, meters, columns

# Useful htop shortcuts:
# H - Show/hide user threads
# K - Show/hide kernel threads
# F - Follow process
# Space - Tag process
# U - Untag all
# c - Tag processes by name

Glances - Comprehensive System Monitor:

# Install Glances:
sudo dnf install glances

# Basic usage:
glances

# Glances modes:
glances -w       # Web server mode (http://localhost:61208)
glances -1       # Show all CPU cores
glances -2       # Disable left sidebar
glances -3       # Disable quick look
glances -4       # Disable top processes

# Export to file:
glances --export csv --export-csv-file /tmp/glances.csv
glances --export json --export-json-file /tmp/glances.json

# Monitor remote system:
# On server:
glances -s       # Server mode

# On client:
glances -c server_ip

# Glances with Docker monitoring:
glances --enable-plugin docker

# Configuration file (~/.config/glances/glances.conf):
cat > ~/.config/glances/glances.conf << 'EOF'
[cpu]
user_careful=50
user_warning=70
user_critical=90

[mem]
careful=50
warning=70
critical=90

[network]
hide=lo,docker.*
EOF

nmon - Performance Monitor:

# Install nmon:
sudo dnf install nmon

# Launch nmon:
nmon

# nmon interactive commands:
# c - CPU statistics
# m - Memory statistics
# d - Disk statistics
# n - Network statistics
# t - Top processes
# h - Help menu

# Capture data for analysis:
nmon -f -t -s 5 -c 120
# -f: Spreadsheet output
# -t: Include top processes
# -s 5: 5-second intervals
# -c 120: 120 snapshots (10 minutes)

# Analyze with nmonchart:
# Creates HTML reports from nmon data

atop - Advanced System Monitor:

# Install atop:
sudo dnf install atop

# Start atop:
atop

# atop features:
# - Process accounting
# - Historical data
# - Disk I/O per process
# - Network activity per process

# View historical data:
atop -r /var/log/atop/atop_20250917

# atop shortcuts:
# g - Generic info
# m - Memory info
# d - Disk info
# n - Network info
# c - Full command lines
# v - Various process info

# Configure atop logging (/etc/sysconfig/atop):
INTERVAL=60      # Log every 60 seconds
LOGGENERATIONS=28  # Keep 28 days

Amazing! 🌟 Advanced monitoring tools deployed!

🌟 Step 3: Network and Service Monitoring

Monitor network traffic and services! ⚡

Network Monitoring Tools:

# Install network monitoring tools:
sudo dnf install net-tools iptraf-ng nethogs iftop

# Monitor network connections:
ss -tuln         # Show listening ports
ss -tan          # Show all TCP connections
netstat -tuln    # Legacy alternative

# Real-time bandwidth monitoring:
sudo iftop -i eth0    # Monitor specific interface
# iftop commands:
# p - Toggle port display
# n - Toggle DNS resolution
# s/d - Toggle source/destination
# 1/2/3 - Sort by different columns

# Bandwidth usage by process:
sudo nethogs eth0
# Shows which processes are using bandwidth

# Detailed network statistics:
ip -s link       # Interface statistics
nstat            # Network statistics
sar -n DEV 2     # Network device statistics

# Monitor specific port:
sudo tcpdump -i eth0 port 80
sudo tcpdump -i any host 192.168.1.100

# Network performance testing:
iperf3 -s        # Server mode
iperf3 -c server_ip  # Client mode

Service Monitoring:

# Systemd service monitoring:
systemctl status
systemctl list-units --failed
systemctl list-units --state=running

# Monitor service logs:
journalctl -u nginx -f      # Follow nginx logs
journalctl -p err -b        # Show errors since boot

# Service resource usage:
systemd-cgtop               # Top-like view for systemd services

# Monitor specific service:
systemctl show nginx --property=MainPID,MemoryCurrent,CPUUsageNSec

# Create service monitor script:
cat > ~/monitor-services.sh << 'EOF'
#!/bin/bash
SERVICES="nginx mysql sshd firewalld"

echo "=== Service Status Check ==="
for service in $SERVICES; do
    if systemctl is-active --quiet $service; then
        echo "✅ $service: Running"
        echo "   Memory: $(systemctl show $service --property=MemoryCurrent --value | numfmt --to=iec)"
    else
        echo "❌ $service: Not running"
    fi
done
EOF

chmod +x ~/monitor-services.sh

Application Performance Monitoring:

# Monitor web server:
# Apache status module:
sudo dnf install mod_status
# Enable in Apache config:
# ExtendedStatus On
# <Location /server-status>
#     SetHandler server-status
#     Require local
# </Location>

# Nginx status:
# Add to nginx.conf:
# location /nginx_status {
#     stub_status;
#     allow 127.0.0.1;
#     deny all;
# }

# Monitor with curl:
curl http://localhost/server-status
curl http://localhost/nginx_status

# MySQL monitoring:
mysql -e "SHOW STATUS LIKE 'Threads_connected'"
mysql -e "SHOW PROCESSLIST"

# Redis monitoring:
redis-cli INFO
redis-cli MONITOR   # Real-time command monitoring

# Monitor application logs:
tail -f /var/log/nginx/access.log | grep -E "50[0-9]"  # 500 errors

Excellent! ⚡ Network and service monitoring ready!

✅ Step 4: Enterprise Monitoring Solutions

Deploy production-grade monitoring systems! 🔧

Prometheus and Grafana Setup:

# Install Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.40.0/prometheus-2.40.0.linux-amd64.tar.gz
tar -xvf prometheus-2.40.0.linux-amd64.tar.gz
sudo mv prometheus-2.40.0.linux-amd64 /opt/prometheus

# Configure Prometheus:
cat | sudo tee /opt/prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']
EOF

# Create systemd service:
cat | sudo tee /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
    --config.file=/opt/prometheus/prometheus.yml \
    --storage.tsdb.path=/opt/prometheus/data

[Install]
WantedBy=multi-user.target
EOF

# Install Node Exporter:
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.5.0.linux-amd64.tar.gz
sudo mv node_exporter-1.5.0.linux-amd64/node_exporter /usr/local/bin/

# Create Node Exporter service:
cat | sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

# Start services:
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus node_exporter

# Install Grafana:
sudo dnf install grafana
sudo systemctl enable --now grafana-server

# Access:
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)

Zabbix Monitoring Platform:

# Install Zabbix repository:
sudo rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/9/x86_64/zabbix-release-6.0-4.el9.noarch.rpm

# Install Zabbix server, frontend, agent:
sudo dnf install zabbix-server-mysql zabbix-web-mysql zabbix-apache-conf zabbix-sql-scripts zabbix-agent

# Configure database:
mysql -u root -p << 'EOF'
CREATE DATABASE zabbix CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
CREATE USER 'zabbix'@'localhost' IDENTIFIED BY 'zabbix_password';
GRANT ALL PRIVILEGES ON zabbix.* TO 'zabbix'@'localhost';
FLUSH PRIVILEGES;
EXIT
EOF

# Import initial schema:
zcat /usr/share/zabbix-sql-scripts/mysql/server.sql.gz | mysql -u zabbix -p zabbix

# Configure Zabbix server:
sudo nano /etc/zabbix/zabbix_server.conf
# Set: DBPassword=zabbix_password

# Start services:
sudo systemctl restart zabbix-server zabbix-agent httpd php-fpm
sudo systemctl enable zabbix-server zabbix-agent httpd php-fpm

# Access: http://localhost/zabbix

Custom Monitoring Scripts:

# Create comprehensive monitoring script:
cat > ~/system-monitor.sh << 'EOF'
#!/bin/bash

# Configuration
LOG_DIR="/var/log/monitoring"
ALERT_EMAIL="[email protected]"
CPU_THRESHOLD=80
MEM_THRESHOLD=90
DISK_THRESHOLD=85

mkdir -p "$LOG_DIR"

# Functions
check_cpu() {
    CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
    if (( $(echo "$CPU_USAGE > $CPU_THRESHOLD" | bc -l) )); then
        echo "⚠️ HIGH CPU: ${CPU_USAGE}%"
        echo "$(date): CPU Alert - ${CPU_USAGE}%" >> "$LOG_DIR/alerts.log"
    else
        echo "✅ CPU: ${CPU_USAGE}%"
    fi
}

check_memory() {
    MEM_USAGE=$(free | grep Mem | awk '{print ($3/$2) * 100.0}')
    if (( $(echo "$MEM_USAGE > $MEM_THRESHOLD" | bc -l) )); then
        echo "⚠️ HIGH MEMORY: ${MEM_USAGE}%"
        echo "$(date): Memory Alert - ${MEM_USAGE}%" >> "$LOG_DIR/alerts.log"
    else
        echo "✅ Memory: ${MEM_USAGE}%"
    fi
}

check_disk() {
    df -h | tail -n +2 | while read line; do
        USAGE=$(echo $line | awk '{print $5}' | sed 's/%//')
        MOUNT=$(echo $line | awk '{print $6}')
        if [ "$USAGE" -gt "$DISK_THRESHOLD" ]; then
            echo "⚠️ HIGH DISK: $MOUNT at ${USAGE}%"
            echo "$(date): Disk Alert - $MOUNT at ${USAGE}%" >> "$LOG_DIR/alerts.log"
        fi
    done
}

check_services() {
    SERVICES="nginx mysql sshd"
    for service in $SERVICES; do
        if ! systemctl is-active --quiet $service; then
            echo "❌ Service Down: $service"
            echo "$(date): Service Alert - $service is down" >> "$LOG_DIR/alerts.log"
        else
            echo "✅ Service Up: $service"
        fi
    done
}

# Main monitoring loop
echo "=== System Monitor Report - $(date) ==="
check_cpu
check_memory
check_disk
check_services

# Send alerts if log has new entries
if [ -f "$LOG_DIR/alerts.log" ]; then
    tail -n 10 "$LOG_DIR/alerts.log" | mail -s "System Alerts" "$ALERT_EMAIL"
fi
EOF

chmod +x ~/system-monitor.sh

# Schedule monitoring:
(crontab -l 2>/dev/null; echo "*/5 * * * * /home/$USER/system-monitor.sh") | crontab -

Perfect! 🏆 Enterprise monitoring deployed!

🎮 Quick Examples

Real-world monitoring scenarios! 🎯

Example 1: Performance Troubleshooting Dashboard

#!/bin/bash
# Interactive performance troubleshooting dashboard

echo "Creating performance dashboard..."

# Create dashboard script
cat > ~/perf-dashboard.sh << 'EOF'
#!/bin/bash

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

# Functions
show_header() {
    clear
    echo "═══════════════════════════════════════════════════════"
    echo "   AlmaLinux Performance Dashboard - $(date +%H:%M:%S)"
    echo "═══════════════════════════════════════════════════════"
}

show_cpu() {
    echo -e "\n${GREEN}▶ CPU Usage:${NC}"
    top -bn1 | head -5
    echo ""
    mpstat 1 1 | tail -2
}

show_memory() {
    echo -e "\n${GREEN}▶ Memory Usage:${NC}"
    free -h
    echo ""
    echo "Top Memory Consumers:"
    ps aux --sort=-%mem | head -5 | awk '{printf "  %-10s %6s %s\n", $1, $4"%", $11}'
}

show_disk() {
    echo -e "\n${GREEN}▶ Disk Usage:${NC}"
    df -h | grep -v tmpfs
    echo ""
    echo "Disk I/O:"
    iostat -x 1 2 | tail -n +4
}

show_network() {
    echo -e "\n${GREEN}▶ Network Activity:${NC}"
    ip -s link | awk '/^[0-9]/{print $2} /RX:/{getline; print "  RX: "$1" bytes"} /TX:/{getline; print "  TX: "$1" bytes"}'
    echo ""
    echo "Active Connections:"
    ss -tun | tail -5
}

show_processes() {
    echo -e "\n${GREEN}▶ Top Processes:${NC}"
    ps aux --sort=-%cpu | head -10 | awk '{printf "  %-10s %6s %6s %s\n", $1, $3"%", $4"%", $11}'
}

check_issues() {
    echo -e "\n${YELLOW}▶ Potential Issues:${NC}"

    # Check CPU
    CPU_IDLE=$(top -bn1 | grep "Cpu(s)" | awk '{print $8}' | cut -d'%' -f1)
    if (( $(echo "$CPU_IDLE < 20" | bc -l) )); then
        echo -e "  ${RED}⚠ High CPU usage detected${NC}"
    fi

    # Check Memory
    MEM_AVAIL=$(free | grep Mem | awk '{print $7/$2 * 100.0}')
    if (( $(echo "$MEM_AVAIL < 20" | bc -l) )); then
        echo -e "  ${RED}⚠ Low memory available${NC}"
    fi

    # Check Swap
    SWAP_USED=$(free | grep Swap | awk '{if($2>0) print $3/$2 * 100.0; else print 0}')
    if (( $(echo "$SWAP_USED > 50" | bc -l) )); then
        echo -e "  ${RED}⚠ High swap usage${NC}"
    fi

    # Check Load
    LOAD=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1)
    CORES=$(nproc)
    if (( $(echo "$LOAD > $CORES" | bc -l) )); then
        echo -e "  ${RED}⚠ System load above CPU count${NC}"
    fi
}

# Main loop
while true; do
    show_header
    show_cpu
    show_memory
    show_disk
    show_network
    show_processes
    check_issues

    echo -e "\n${GREEN}Press [Enter] to refresh, [Q] to quit${NC}"
    read -t 5 -n 1 key
    if [[ $key = "q" ]] || [[ $key = "Q" ]]; then
        break
    fi
done
EOF

chmod +x ~/perf-dashboard.sh

echo "Dashboard created! Run with: ~/perf-dashboard.sh"

Example 2: Automated Alert System

#!/bin/bash
# Comprehensive alerting system

echo "Setting up automated alert system..."

# Create alert monitor
cat > ~/alert-monitor.sh << 'EOF'
#!/bin/bash

# Configuration
ALERT_LOG="/var/log/system-alerts.log"
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
EMAIL="[email protected]"

# Thresholds
CPU_CRITICAL=90
CPU_WARNING=70
MEM_CRITICAL=95
MEM_WARNING=85
DISK_CRITICAL=95
DISK_WARNING=85
LOAD_MULTIPLIER=2

# Functions
send_alert() {
    local severity=$1
    local component=$2
    local message=$3
    local value=$4

    # Log alert
    echo "$(date '+%Y-%m-%d %H:%M:%S') [$severity] $component: $message (Value: $value)" >> "$ALERT_LOG"

    # Send to Slack (if configured)
    if [ ! -z "$WEBHOOK_URL" ]; then
        curl -X POST -H 'Content-type: application/json' \
            --data "{\"text\":\"🚨 [$severity] $component: $message (Value: $value)\"}" \
            "$WEBHOOK_URL" 2>/dev/null
    fi

    # Send email for critical alerts
    if [ "$severity" = "CRITICAL" ]; then
        echo "$message (Value: $value)" | mail -s "[$severity] System Alert: $component" "$EMAIL"
    fi
}

check_cpu() {
    local cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print 100 - $8}' | cut -d'%' -f1)

    if (( $(echo "$cpu_usage > $CPU_CRITICAL" | bc -l) )); then
        send_alert "CRITICAL" "CPU" "CPU usage critically high" "${cpu_usage}%"
    elif (( $(echo "$cpu_usage > $CPU_WARNING" | bc -l) )); then
        send_alert "WARNING" "CPU" "CPU usage high" "${cpu_usage}%"
    fi
}

check_memory() {
    local mem_usage=$(free | grep Mem | awk '{print ($3/$2) * 100.0}')

    if (( $(echo "$mem_usage > $MEM_CRITICAL" | bc -l) )); then
        send_alert "CRITICAL" "MEMORY" "Memory usage critically high" "${mem_usage}%"
    elif (( $(echo "$mem_usage > $MEM_WARNING" | bc -l) )); then
        send_alert "WARNING" "MEMORY" "Memory usage high" "${mem_usage}%"
    fi
}

check_disk() {
    df -h | tail -n +2 | while read line; do
        local usage=$(echo $line | awk '{print $5}' | sed 's/%//')
        local mount=$(echo $line | awk '{print $6}')

        if [ "$usage" -gt "$DISK_CRITICAL" ]; then
            send_alert "CRITICAL" "DISK" "Disk space critically low on $mount" "${usage}%"
        elif [ "$usage" -gt "$DISK_WARNING" ]; then
            send_alert "WARNING" "DISK" "Disk space low on $mount" "${usage}%"
        fi
    done
}

check_load() {
    local load=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1 | xargs)
    local cores=$(nproc)
    local threshold=$(echo "$cores * $LOAD_MULTIPLIER" | bc)

    if (( $(echo "$load > $threshold" | bc -l) )); then
        send_alert "WARNING" "LOAD" "System load high" "$load (${cores} cores)"
    fi
}

check_services() {
    local critical_services="sshd firewalld"
    local important_services="nginx mysql redis"

    for service in $critical_services; do
        if ! systemctl is-active --quiet $service; then
            send_alert "CRITICAL" "SERVICE" "$service is down" "stopped"
        fi
    done

    for service in $important_services; do
        if ! systemctl is-active --quiet $service 2>/dev/null; then
            send_alert "WARNING" "SERVICE" "$service is down" "stopped"
        fi
    done
}

check_network() {
    # Check network interfaces
    for interface in $(ip link | grep "^[0-9]" | cut -d: -f2 | grep -v lo); do
        if ! ip link show $interface | grep -q "state UP"; then
            send_alert "WARNING" "NETWORK" "Interface $interface is down" "DOWN"
        fi
    done

    # Check connectivity
    if ! ping -c 1 8.8.8.8 &>/dev/null; then
        send_alert "CRITICAL" "NETWORK" "No internet connectivity" "FAILED"
    fi
}

# Main monitoring
check_cpu
check_memory
check_disk
check_load
check_services
check_network

# Cleanup old alerts (keep 30 days)
find /var/log -name "system-alerts.log.*" -mtime +30 -delete
EOF

chmod +x ~/alert-monitor.sh

# Schedule monitoring
echo "*/5 * * * * /home/$USER/alert-monitor.sh" | crontab -

echo "Alert system configured!"

Example 3: Container Monitoring Setup

#!/bin/bash
# Docker/Podman container monitoring

echo "Setting up container monitoring..."

# Create container monitor
cat > ~/container-monitor.sh << 'EOF'
#!/bin/bash

# Detect container runtime
if command -v docker &>/dev/null; then
    RUNTIME="docker"
elif command -v podman &>/dev/null; then
    RUNTIME="podman"
else
    echo "No container runtime found"
    exit 1
fi

# Container monitoring dashboard
show_containers() {
    echo "=== Container Status ==="
    $RUNTIME ps --format "table {{.Names}}\t{{.Status}}\t{{.Size}}"
}

show_container_stats() {
    echo -e "\n=== Container Resource Usage ==="
    $RUNTIME stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
}

show_container_logs() {
    echo -e "\n=== Recent Container Logs ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        echo "▶ $name:"
        $RUNTIME logs --tail 5 $container 2>&1 | sed 's/^/  /'
        echo ""
    done
}

check_container_health() {
    echo -e "\n=== Container Health Checks ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        health=$($RUNTIME inspect -f '{{.State.Health.Status}}' $container 2>/dev/null)

        if [ "$health" = "healthy" ]; then
            echo "✅ $name: Healthy"
        elif [ "$health" = "unhealthy" ]; then
            echo "❌ $name: Unhealthy"
        else
            echo "⚠️ $name: No health check"
        fi
    done
}

monitor_container_resources() {
    echo -e "\n=== Detailed Resource Monitoring ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        echo "Container: $name"

        # Get cgroup stats
        if [ "$RUNTIME" = "docker" ]; then
            # CPU usage
            cpu_usage=$($RUNTIME exec $container cat /sys/fs/cgroup/cpuacct/cpuacct.usage 2>/dev/null)
            echo "  CPU nanoseconds: $cpu_usage"

            # Memory usage
            mem_usage=$($RUNTIME exec $container cat /sys/fs/cgroup/memory/memory.usage_in_bytes 2>/dev/null)
            echo "  Memory bytes: $mem_usage"
        fi

        # Process count
        proc_count=$($RUNTIME exec $container ps aux 2>/dev/null | wc -l)
        echo "  Processes: $proc_count"

        # Network connections
        net_conn=$($RUNTIME exec $container netstat -tan 2>/dev/null | grep ESTABLISHED | wc -l)
        echo "  Network connections: $net_conn"
        echo ""
    done
}

# Prometheus metrics exporter
export_metrics() {
    echo -e "\n=== Prometheus Metrics ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        stats=$($RUNTIME stats --no-stream --format "{{json .}}" $container)

        cpu=$(echo $stats | jq -r '.CPUPerc' | sed 's/%//')
        mem=$(echo $stats | jq -r '.MemPerc' | sed 's/%//')

        echo "container_cpu_usage_percent{name=\"$name\"} $cpu"
        echo "container_memory_usage_percent{name=\"$name\"} $mem"
    done
}

# Main monitoring loop
while true; do
    clear
    echo "Container Monitoring Dashboard - $(date)"
    echo "========================================"

    show_containers
    show_container_stats
    check_container_health
    monitor_container_resources

    if [ "$1" = "--export" ]; then
        export_metrics > /tmp/container_metrics.prom
    fi

    echo -e "\nPress Ctrl+C to exit"
    sleep 5
done
EOF

chmod +x ~/container-monitor.sh

# Create container alert script
cat > ~/container-alerts.sh << 'EOF'
#!/bin/bash

RUNTIME=$(command -v docker || command -v podman)
ALERT_LOG="/var/log/container-alerts.log"

# Check for stopped containers
for container in $($RUNTIME ps -a -q); do
    status=$($RUNTIME inspect -f '{{.State.Status}}' $container)
    name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')

    if [ "$status" != "running" ]; then
        echo "$(date): Container $name is $status" >> "$ALERT_LOG"

        # Try to restart
        $RUNTIME start $container

        if [ $? -eq 0 ]; then
            echo "$(date): Successfully restarted $name" >> "$ALERT_LOG"
        else
            echo "$(date): Failed to restart $name" >> "$ALERT_LOG"
        fi
    fi
done

# Check container resource usage
$RUNTIME stats --no-stream --format "{{json .}}" | while read stats; do
    name=$(echo $stats | jq -r '.Name')
    cpu=$(echo $stats | jq -r '.CPUPerc' | sed 's/%//')
    mem=$(echo $stats | jq -r '.MemPerc' | sed 's/%//')

    if (( $(echo "$cpu > 80" | bc -l) )); then
        echo "$(date): High CPU usage in $name: ${cpu}%" >> "$ALERT_LOG"
    fi

    if (( $(echo "$mem > 80" | bc -l) )); then
        echo "$(date): High memory usage in $name: ${mem}%" >> "$ALERT_LOG"
    fi
done
EOF

chmod +x ~/container-alerts.sh

echo "Container monitoring configured!"
echo "Run dashboard: ~/container-monitor.sh"
echo "Schedule alerts: */5 * * * * ~/container-alerts.sh"

Example 4: Database Performance Monitoring

#!/bin/bash
# Database monitoring for MySQL/MariaDB and PostgreSQL

echo "Setting up database monitoring..."

# MySQL/MariaDB monitoring
cat > ~/mysql-monitor.sh << 'EOF'
#!/bin/bash

DB_USER="monitor"
DB_PASS="monitor_password"
ALERT_LOG="/var/log/mysql-monitor.log"

# Create monitoring user if not exists
mysql -u root -p << SQL
CREATE USER IF NOT EXISTS 'monitor'@'localhost' IDENTIFIED BY 'monitor_password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'monitor'@'localhost';
FLUSH PRIVILEGES;
SQL

# Monitor function
monitor_mysql() {
    echo "=== MySQL Monitoring Report - $(date) ==="

    # Connection statistics
    echo -e "\n▶ Connection Statistics:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW STATUS WHERE Variable_name IN ('Threads_connected','Max_used_connections','Aborted_connects');" 2>/dev/null

    # Query performance
    echo -e "\n▶ Query Performance:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW STATUS WHERE Variable_name IN ('Slow_queries','Questions','Queries');" 2>/dev/null

    # InnoDB statistics
    echo -e "\n▶ InnoDB Buffer Pool:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW STATUS WHERE Variable_name LIKE 'Innodb_buffer_pool%';" 2>/dev/null | head -10

    # Current processes
    echo -e "\n▶ Active Processes:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW PROCESSLIST;" 2>/dev/null

    # Table sizes
    echo -e "\n▶ Largest Tables:"
    mysql -u $DB_USER -p$DB_PASS << SQL 2>/dev/null
    SELECT
        table_schema AS 'Database',
        table_name AS 'Table',
        ROUND((data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
    FROM information_schema.TABLES
    ORDER BY (data_length + index_length) DESC
    LIMIT 10;
SQL

    # Slow query log
    echo -e "\n▶ Recent Slow Queries:"
    if [ -f /var/log/mysql/slow.log ]; then
        tail -10 /var/log/mysql/slow.log
    fi
}

# Performance metrics
check_performance() {
    # Check slow queries
    SLOW_QUERIES=$(mysql -u $DB_USER -p$DB_PASS -se "SHOW STATUS LIKE 'Slow_queries';" | awk '{print $2}')
    if [ "$SLOW_QUERIES" -gt 100 ]; then
        echo "$(date): WARNING - High number of slow queries: $SLOW_QUERIES" >> "$ALERT_LOG"
    fi

    # Check connections
    CURRENT_CONN=$(mysql -u $DB_USER -p$DB_PASS -se "SHOW STATUS LIKE 'Threads_connected';" | awk '{print $2}')
    MAX_CONN=$(mysql -u $DB_USER -p$DB_PASS -se "SHOW VARIABLES LIKE 'max_connections';" | awk '{print $2}')
    USAGE=$(echo "scale=2; $CURRENT_CONN / $MAX_CONN * 100" | bc)

    if (( $(echo "$USAGE > 80" | bc -l) )); then
        echo "$(date): WARNING - High connection usage: ${USAGE}%" >> "$ALERT_LOG"
    fi
}

# Run monitoring
monitor_mysql
check_performance
EOF

chmod +x ~/mysql-monitor.sh

# PostgreSQL monitoring
cat > ~/postgres-monitor.sh << 'EOF'
#!/bin/bash

export PGUSER="postgres"
ALERT_LOG="/var/log/postgres-monitor.log"

monitor_postgres() {
    echo "=== PostgreSQL Monitoring Report - $(date) ==="

    # Connection statistics
    echo -e "\n▶ Connection Statistics:"
    psql -c "SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;"

    # Database sizes
    echo -e "\n▶ Database Sizes:"
    psql -c "SELECT datname, pg_size_pretty(pg_database_size(datname)) as size FROM pg_database ORDER BY pg_database_size(datname) DESC;"

    # Long running queries
    echo -e "\n▶ Long Running Queries:"
    psql << SQL
    SELECT pid, now() - pg_stat_activity.query_start AS duration, query
    FROM pg_stat_activity
    WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
    AND state != 'idle';
SQL

    # Cache hit ratio
    echo -e "\n▶ Cache Hit Ratio:"
    psql << SQL
    SELECT
        sum(heap_blks_read) as heap_read,
        sum(heap_blks_hit) as heap_hit,
        sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
    FROM pg_statio_user_tables;
SQL

    # Table bloat
    echo -e "\n▶ Table Bloat:"
    psql << SQL
    SELECT
        schemaname,
        tablename,
        pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
        n_dead_tup
    FROM pg_stat_user_tables
    WHERE n_dead_tup > 1000
    ORDER BY n_dead_tup DESC
    LIMIT 10;
SQL
}

# Run monitoring
monitor_postgres
EOF

chmod +x ~/postgres-monitor.sh

echo "Database monitoring scripts created!"
echo "MySQL: ~/mysql-monitor.sh"
echo "PostgreSQL: ~/postgres-monitor.sh"

🚨 Fix Common Problems

Monitoring troubleshooting guide! 🔧

Problem 1: High CPU Usage

Solution:

# Identify CPU consumers:
# 1. Find top CPU processes
top -bn1 | head -20
ps aux --sort=-%cpu | head -10

# 2. Check for runaway processes
ps aux | awk '$3 > 50 {print $0}'

# 3. Analyze specific process
PID=12345  # Replace with actual PID
strace -p $PID -c  # Count system calls
lsof -p $PID       # Open files
pmap $PID          # Memory map

# 4. Check for CPU intensive services
systemd-cgtop

# 5. Temporary fixes
# Limit CPU usage with nice/renice:
renice +10 -p $PID   # Lower priority
# Or use cpulimit:
sudo dnf install cpulimit
cpulimit -p $PID -l 50  # Limit to 50%

# 6. Investigate cause
journalctl -u service_name --since "1 hour ago"
dmesg | tail -50

# 7. Permanent solutions
# Update software
sudo dnf update
# Optimize configuration
# Add more CPU cores

Problem 2: Memory Leaks

Solution:

# Diagnose memory issues:
# 1. Find memory hogs
ps aux --sort=-%mem | head -10
smem -r -k | head -10  # Install: sudo dnf install smem

# 2. Track memory growth
while true; do
    ps aux | grep process_name
    sleep 60
done

# 3. Analyze memory usage
cat /proc/$PID/status | grep -E "VmSize|VmRSS|VmSwap"
pmap -x $PID | tail -1

# 4. Memory leak detection
# Install valgrind
sudo dnf install valgrind
valgrind --leak-check=full ./program

# 5. Clear caches
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

# 6. Configure swap
# Check swap usage
free -h
swapon --show

# Add swap file if needed
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# 7. Set memory limits
# Edit /etc/security/limits.conf
username soft memlock 1048576
username hard memlock 1048576

Problem 3: Disk I/O Bottlenecks

Solution:

# Identify I/O issues:
# 1. Check I/O statistics
iostat -x 2
iotop -o

# 2. Find heavy I/O processes
pidstat -d 2

# 3. Check disk health
sudo smartctl -H /dev/sda
sudo smartctl -a /dev/sda | grep -E "Reallocated|Pending|Uncorrectable"

# 4. Analyze specific process I/O
cat /proc/$PID/io

# 5. Optimize I/O
# Change I/O scheduler
echo noop | sudo tee /sys/block/sda/queue/scheduler
# Options: noop, deadline, cfq, bfq

# 6. Mount options optimization
# Edit /etc/fstab
# Add: noatime,nodiratime

# 7. Move high I/O to different disk
# Identify heavy directories
du -sh /* | sort -hr | head -10

# Move to SSD or separate disk

Problem 4: Network Performance Issues

Solution:

# Diagnose network problems:
# 1. Check interface statistics
ip -s link show
netstat -i

# 2. Monitor bandwidth
iftop -i eth0
nethogs

# 3. Check for packet loss
ping -c 100 google.com | grep loss
mtr google.com

# 4. TCP tuning
# Check current settings
sysctl net.ipv4.tcp_congestion_control
sysctl net.core.rmem_max

# Optimize settings
cat | sudo tee /etc/sysctl.d/99-network.conf << EOF
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
EOF

sudo sysctl -p /etc/sysctl.d/99-network.conf

# 5. DNS issues
# Test DNS performance
dig google.com | grep "Query time"

# Use faster DNS
echo "nameserver 1.1.1.1" | sudo tee /etc/resolv.conf

# 6. Check for network errors
ethtool -S eth0 | grep -E "errors|drops"

📋 Monitoring Tools Quick Reference

Tool	Purpose	Command
top	Process monitor	`top`
htop	Enhanced process monitor	`htop`
free	Memory usage	`free -h`
df	Disk usage	`df -h`
iostat	I/O statistics	`iostat -x 2`
iftop	Network bandwidth	`sudo iftop -i eth0`
glances	System overview	`glances`
nmon	Performance monitor	`nmon`
sar	System activity	`sar -u 2`
vmstat	Virtual memory	`vmstat 2`

💡 Tips for Success

Master system monitoring like a professional! 🌟

📊 Baseline Normal: Know what normal looks like
🔄 Regular Monitoring: Check systems daily
📝 Document Patterns: Record recurring issues
🎯 Set Thresholds: Define alert trigger points
📈 Track Trends: Monitor long-term patterns
🛠️ Automate Alerts: Don’t rely on manual checks
💾 Store Metrics: Keep historical data
🔍 Correlate Events: Look for related issues
📱 Mobile Access: Monitor from anywhere
🤝 Share Dashboards: Keep team informed

🏆 What You Learned

Congratulations! You’re now a monitoring expert! 🎉

✅ Mastered essential command-line monitoring tools
✅ Deployed advanced monitoring utilities
✅ Configured network and service monitoring
✅ Implemented enterprise monitoring solutions
✅ Built custom monitoring scripts and dashboards
✅ Created automated alerting systems
✅ Solved common performance problems
✅ Gained professional system observability skills

🎯 Why This Matters

Your monitoring expertise ensures system reliability! 🚀

🚨 Proactive Management: Fix issues before they escalate
💰 Cost Savings: Optimize resource usage
📈 Performance: Maintain peak system efficiency
🛡️ Security: Detect anomalies and threats
💼 Professional Value: Essential operations skill
🎯 Rapid Response: Quickly identify root causes
📊 Data-Driven: Make informed decisions
🌟 System Health: Ensure continuous availability

You now have complete visibility into your Linux systems! 🏆

Monitor everything, miss nothing! 🙌

📊 AlmaLinux Monitoring Tools: Complete System Oversight Guide

Table of Contents

📊 AlmaLinux Monitoring Tools: Complete System Oversight Guide

🤔 Why System Monitoring is Essential?

🎯 What You Need

📝 Step 1: Basic Command-Line Monitoring

System Load and Uptime:

Memory Usage Monitoring:

CPU Usage Tracking:

Disk Usage and I/O:

🔧 Step 2: Advanced Monitoring Tools

htop - Enhanced Process Monitor:

Glances - Comprehensive System Monitor:

nmon - Performance Monitor:

atop - Advanced System Monitor:

🌟 Step 3: Network and Service Monitoring

Network Monitoring Tools:

Service Monitoring:

Application Performance Monitoring:

✅ Step 4: Enterprise Monitoring Solutions

Prometheus and Grafana Setup:

Zabbix Monitoring Platform:

Custom Monitoring Scripts:

🎮 Quick Examples

Example 1: Performance Troubleshooting Dashboard

Example 2: Automated Alert System

Example 3: Container Monitoring Setup

Example 4: Database Performance Monitoring

🚨 Fix Common Problems

Problem 1: High CPU Usage

Problem 2: Memory Leaks

Problem 3: Disk I/O Bottlenecks

Problem 4: Network Performance Issues

📋 Monitoring Tools Quick Reference

💡 Tips for Success

🏆 What You Learned

🎯 Why This Matters

Share this article

📊 AlmaLinux Monitoring Tools: Complete System Oversight Guide

Table of Contents

📊 AlmaLinux Monitoring Tools: Complete System Oversight Guide

🤔 Why System Monitoring is Essential?

🎯 What You Need

📝 Step 1: Basic Command-Line Monitoring

System Load and Uptime:

Memory Usage Monitoring:

CPU Usage Tracking:

Disk Usage and I/O:

🔧 Step 2: Advanced Monitoring Tools

htop - Enhanced Process Monitor:

Glances - Comprehensive System Monitor:

nmon - Performance Monitor:

atop - Advanced System Monitor:

🌟 Step 3: Network and Service Monitoring

Network Monitoring Tools:

Service Monitoring:

Application Performance Monitoring:

✅ Step 4: Enterprise Monitoring Solutions

Prometheus and Grafana Setup:

Zabbix Monitoring Platform:

Custom Monitoring Scripts:

🎮 Quick Examples

Example 1: Performance Troubleshooting Dashboard

Example 2: Automated Alert System

Example 3: Container Monitoring Setup

Example 4: Database Performance Monitoring

🚨 Fix Common Problems

Problem 1: High CPU Usage

Problem 2: Memory Leaks

Problem 3: Disk I/O Bottlenecks

Problem 4: Network Performance Issues

📋 Monitoring Tools Quick Reference

💡 Tips for Success

🏆 What You Learned

🎯 Why This Matters

Share this article

Related Articles

📊 AlmaLinux Network Monitoring with Zabbix Complete Setup Guide

Building a Monitoring Stack with Prometheus and Grafana on AlmaLinux

🔭 OpenTelemetry Observability on AlmaLinux 9: Complete Guide

Scan QR Code