+
+
+
+
+
+
meteor
+
+
+
+
torch
>=
+
aws
+
intellij
+
zorin
+
+
+
|>
aurelia
istio
xgboost
+
f#
riot
gitlab
ansible
+
+
graphql
+
+
yaml
docker
+
pycharm
remix
numpy
+
s3
android
+
+
json
+
+
+
+
nomad
+
+
fortran
symfony
+
+
angular
+
dns
+
wsl
<-
+
+
+
+
+
qdrant
+
pinecone
!!
+
+
elixir
+
^
+
+
fiber
โŠ‚
vscode
json
axum
+
+
+
swc
Back to Blog
๐Ÿ“Š AlmaLinux Monitoring Tools: Complete System Oversight Guide
almalinux monitoring system-monitoring

๐Ÿ“Š AlmaLinux Monitoring Tools: Complete System Oversight Guide

Published Sep 17, 2025

Master AlmaLinux system monitoring! Complete guide to performance monitoring, resource tracking, and alerting with top, htop, Glances, and professional tools. Perfect for maintaining healthy Linux systems.

30 min read
0 views
Table of Contents

๐Ÿ“Š AlmaLinux Monitoring Tools: Complete System Oversight Guide

Want to know exactly whatโ€™s happening inside your AlmaLinux system? ๐Ÿ” System monitoring is the key to maintaining healthy, performant servers and workstations! This comprehensive guide takes you from basic resource monitoring to professional-grade observability platforms. Whether youโ€™re tracking CPU usage or building complete monitoring dashboards, letโ€™s master the art of system oversight! โšก

๐Ÿค” Why System Monitoring is Essential?

Monitoring transforms reactive firefighting into proactive management! ๐ŸŒŸ Hereโ€™s why itโ€™s crucial:

  • ๐Ÿšจ Early Problem Detection: Spot issues before users notice
  • ๐Ÿ“ˆ Performance Optimization: Identify and fix bottlenecks
  • ๐Ÿ’ฐ Resource Planning: Know when to upgrade hardware
  • ๐Ÿ›ก๏ธ Security Monitoring: Detect suspicious activities
  • ๐Ÿ“Š Capacity Planning: Predict future resource needs
  • ๐Ÿ”ง Troubleshooting: Quickly diagnose system problems
  • ๐Ÿ“ Compliance: Meet audit and reporting requirements
  • ๐Ÿ˜Œ Peace of Mind: Always know your systemโ€™s health

Properly monitored systems have 75% less downtime! ๐Ÿ†

๐ŸŽฏ What You Need

Letโ€™s prepare for monitoring mastery! โœ…

  • โœ… AlmaLinux system with root or sudo access
  • โœ… Basic understanding of system resources
  • โœ… Terminal access for command-line tools
  • โœ… Network connectivity for remote monitoring
  • โœ… 60 minutes to explore all monitoring tools
  • โœ… Some test workloads to monitor
  • โœ… Curiosity about system internals
  • โœ… Excitement to see everything happening! ๐ŸŽ‰

Letโ€™s unlock complete system visibility! ๐ŸŒ

๐Ÿ“ Step 1: Basic Command-Line Monitoring

Master essential monitoring commands! ๐ŸŽฏ

System Load and Uptime:

# Check system uptime and load average:
uptime
# Output: 10:23:45 up 5 days, 3:15, 2 users, load average: 0.15, 0.12, 0.09
# Load average = 1-minute, 5-minute, 15-minute averages

# Detailed uptime information:
uptime -p    # Pretty format: up 5 days, 3 hours, 15 minutes
uptime -s    # System start time: 2025-09-12 07:08:30

# Understanding load average:
# Load of 1.0 = 100% CPU usage on single-core system
# Load of 4.0 = 100% CPU usage on quad-core system
# Check CPU count:
nproc        # Number of processing units

# W command - who's logged in and what they're doing:
w
# Shows users, TTY, login time, idle time, JCPU, PCPU, and current command

Memory Usage Monitoring:

# Free memory display:
free -h      # Human-readable format
# Output:
#               total        used        free      shared  buff/cache   available
# Mem:           15Gi       3.2Gi       8.1Gi       245Mi       4.2Gi        11Gi
# Swap:         8.0Gi          0B       8.0Gi

# Continuous memory monitoring:
free -h -s 2  # Update every 2 seconds

# Detailed memory information:
cat /proc/meminfo | head -20

# Memory usage by process:
ps aux --sort=-%mem | head -10

# Show memory statistics:
vmstat 2 5    # Update every 2 seconds, 5 times
# Columns: r=running, b=blocked, swpd=swap used, free=free memory
#          buff=buffers, cache=cache, si=swap in, so=swap out

# Memory pressure information:
cat /proc/pressure/memory

CPU Usage Tracking:

# Real-time CPU usage:
top          # Interactive process viewer
# Key commands in top:
# 1 - Show individual CPU cores
# M - Sort by memory usage
# P - Sort by CPU usage
# k - Kill process
# r - Renice process
# q - Quit

# CPU information:
lscpu        # Detailed CPU architecture information

# Per-core CPU usage:
mpstat -P ALL 2   # All CPUs, update every 2 seconds
# Install if missing: sudo dnf install sysstat

# Process CPU usage:
ps aux --sort=-%cpu | head -10

# CPU frequency monitoring:
watch -n 1 "grep MHz /proc/cpuinfo"

# CPU temperature (if sensors available):
sensors      # Install: sudo dnf install lm_sensors
sudo sensors-detect  # Configure sensors

Disk Usage and I/O:

# Disk space usage:
df -h        # Human-readable disk usage
df -i        # Inode usage

# Directory disk usage:
du -sh /var/*    # Size of directories in /var
du -h --max-depth=1 / | sort -hr  # Sorted by size

# Disk I/O statistics:
iostat -x 2      # Extended stats, update every 2 seconds

# Real-time I/O monitoring:
iotop            # Interactive I/O monitor
# Install: sudo dnf install iotop

# Disk I/O by process:
sudo iotop -o    # Only show processes doing I/O

# Check disk health:
sudo smartctl -a /dev/sda   # Install: sudo dnf install smartmontools

Perfect! ๐ŸŽ‰ Basic monitoring commands mastered!

๐Ÿ”ง Step 2: Advanced Monitoring Tools

Deploy professional monitoring utilities! ๐Ÿ“ฆ

htop - Enhanced Process Monitor:

# Install htop:
sudo dnf install htop

# Launch htop:
htop

# htop features:
# - Color-coded resource bars
# - Tree view of processes (F5)
# - Search processes (F3)
# - Filter processes (F4)
# - Kill processes (F9)
# - Sort by columns (F6)
# - Setup/customize (F2)

# htop configuration (~/.config/htop/htoprc):
# Customize colors, meters, columns

# Useful htop shortcuts:
# H - Show/hide user threads
# K - Show/hide kernel threads
# F - Follow process
# Space - Tag process
# U - Untag all
# c - Tag processes by name

Glances - Comprehensive System Monitor:

# Install Glances:
sudo dnf install glances

# Basic usage:
glances

# Glances modes:
glances -w       # Web server mode (http://localhost:61208)
glances -1       # Show all CPU cores
glances -2       # Disable left sidebar
glances -3       # Disable quick look
glances -4       # Disable top processes

# Export to file:
glances --export csv --export-csv-file /tmp/glances.csv
glances --export json --export-json-file /tmp/glances.json

# Monitor remote system:
# On server:
glances -s       # Server mode

# On client:
glances -c server_ip

# Glances with Docker monitoring:
glances --enable-plugin docker

# Configuration file (~/.config/glances/glances.conf):
cat > ~/.config/glances/glances.conf << 'EOF'
[cpu]
user_careful=50
user_warning=70
user_critical=90

[mem]
careful=50
warning=70
critical=90

[network]
hide=lo,docker.*
EOF

nmon - Performance Monitor:

# Install nmon:
sudo dnf install nmon

# Launch nmon:
nmon

# nmon interactive commands:
# c - CPU statistics
# m - Memory statistics
# d - Disk statistics
# n - Network statistics
# t - Top processes
# h - Help menu

# Capture data for analysis:
nmon -f -t -s 5 -c 120
# -f: Spreadsheet output
# -t: Include top processes
# -s 5: 5-second intervals
# -c 120: 120 snapshots (10 minutes)

# Analyze with nmonchart:
# Creates HTML reports from nmon data

atop - Advanced System Monitor:

# Install atop:
sudo dnf install atop

# Start atop:
atop

# atop features:
# - Process accounting
# - Historical data
# - Disk I/O per process
# - Network activity per process

# View historical data:
atop -r /var/log/atop/atop_20250917

# atop shortcuts:
# g - Generic info
# m - Memory info
# d - Disk info
# n - Network info
# c - Full command lines
# v - Various process info

# Configure atop logging (/etc/sysconfig/atop):
INTERVAL=60      # Log every 60 seconds
LOGGENERATIONS=28  # Keep 28 days

Amazing! ๐ŸŒŸ Advanced monitoring tools deployed!

๐ŸŒŸ Step 3: Network and Service Monitoring

Monitor network traffic and services! โšก

Network Monitoring Tools:

# Install network monitoring tools:
sudo dnf install net-tools iptraf-ng nethogs iftop

# Monitor network connections:
ss -tuln         # Show listening ports
ss -tan          # Show all TCP connections
netstat -tuln    # Legacy alternative

# Real-time bandwidth monitoring:
sudo iftop -i eth0    # Monitor specific interface
# iftop commands:
# p - Toggle port display
# n - Toggle DNS resolution
# s/d - Toggle source/destination
# 1/2/3 - Sort by different columns

# Bandwidth usage by process:
sudo nethogs eth0
# Shows which processes are using bandwidth

# Detailed network statistics:
ip -s link       # Interface statistics
nstat            # Network statistics
sar -n DEV 2     # Network device statistics

# Monitor specific port:
sudo tcpdump -i eth0 port 80
sudo tcpdump -i any host 192.168.1.100

# Network performance testing:
iperf3 -s        # Server mode
iperf3 -c server_ip  # Client mode

Service Monitoring:

# Systemd service monitoring:
systemctl status
systemctl list-units --failed
systemctl list-units --state=running

# Monitor service logs:
journalctl -u nginx -f      # Follow nginx logs
journalctl -p err -b        # Show errors since boot

# Service resource usage:
systemd-cgtop               # Top-like view for systemd services

# Monitor specific service:
systemctl show nginx --property=MainPID,MemoryCurrent,CPUUsageNSec

# Create service monitor script:
cat > ~/monitor-services.sh << 'EOF'
#!/bin/bash
SERVICES="nginx mysql sshd firewalld"

echo "=== Service Status Check ==="
for service in $SERVICES; do
    if systemctl is-active --quiet $service; then
        echo "โœ… $service: Running"
        echo "   Memory: $(systemctl show $service --property=MemoryCurrent --value | numfmt --to=iec)"
    else
        echo "โŒ $service: Not running"
    fi
done
EOF

chmod +x ~/monitor-services.sh

Application Performance Monitoring:

# Monitor web server:
# Apache status module:
sudo dnf install mod_status
# Enable in Apache config:
# ExtendedStatus On
# <Location /server-status>
#     SetHandler server-status
#     Require local
# </Location>

# Nginx status:
# Add to nginx.conf:
# location /nginx_status {
#     stub_status;
#     allow 127.0.0.1;
#     deny all;
# }

# Monitor with curl:
curl http://localhost/server-status
curl http://localhost/nginx_status

# MySQL monitoring:
mysql -e "SHOW STATUS LIKE 'Threads_connected'"
mysql -e "SHOW PROCESSLIST"

# Redis monitoring:
redis-cli INFO
redis-cli MONITOR   # Real-time command monitoring

# Monitor application logs:
tail -f /var/log/nginx/access.log | grep -E "50[0-9]"  # 500 errors

Excellent! โšก Network and service monitoring ready!

โœ… Step 4: Enterprise Monitoring Solutions

Deploy production-grade monitoring systems! ๐Ÿ”ง

Prometheus and Grafana Setup:

# Install Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.40.0/prometheus-2.40.0.linux-amd64.tar.gz
tar -xvf prometheus-2.40.0.linux-amd64.tar.gz
sudo mv prometheus-2.40.0.linux-amd64 /opt/prometheus

# Configure Prometheus:
cat | sudo tee /opt/prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']
EOF

# Create systemd service:
cat | sudo tee /etc/systemd/system/prometheus.service << 'EOF'
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
    --config.file=/opt/prometheus/prometheus.yml \
    --storage.tsdb.path=/opt/prometheus/data

[Install]
WantedBy=multi-user.target
EOF

# Install Node Exporter:
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.5.0.linux-amd64.tar.gz
sudo mv node_exporter-1.5.0.linux-amd64/node_exporter /usr/local/bin/

# Create Node Exporter service:
cat | sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

# Start services:
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus node_exporter

# Install Grafana:
sudo dnf install grafana
sudo systemctl enable --now grafana-server

# Access:
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)

Zabbix Monitoring Platform:

# Install Zabbix repository:
sudo rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/9/x86_64/zabbix-release-6.0-4.el9.noarch.rpm

# Install Zabbix server, frontend, agent:
sudo dnf install zabbix-server-mysql zabbix-web-mysql zabbix-apache-conf zabbix-sql-scripts zabbix-agent

# Configure database:
mysql -u root -p << 'EOF'
CREATE DATABASE zabbix CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
CREATE USER 'zabbix'@'localhost' IDENTIFIED BY 'zabbix_password';
GRANT ALL PRIVILEGES ON zabbix.* TO 'zabbix'@'localhost';
FLUSH PRIVILEGES;
EXIT
EOF

# Import initial schema:
zcat /usr/share/zabbix-sql-scripts/mysql/server.sql.gz | mysql -u zabbix -p zabbix

# Configure Zabbix server:
sudo nano /etc/zabbix/zabbix_server.conf
# Set: DBPassword=zabbix_password

# Start services:
sudo systemctl restart zabbix-server zabbix-agent httpd php-fpm
sudo systemctl enable zabbix-server zabbix-agent httpd php-fpm

# Access: http://localhost/zabbix

Custom Monitoring Scripts:

# Create comprehensive monitoring script:
cat > ~/system-monitor.sh << 'EOF'
#!/bin/bash

# Configuration
LOG_DIR="/var/log/monitoring"
ALERT_EMAIL="[email protected]"
CPU_THRESHOLD=80
MEM_THRESHOLD=90
DISK_THRESHOLD=85

mkdir -p "$LOG_DIR"

# Functions
check_cpu() {
    CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
    if (( $(echo "$CPU_USAGE > $CPU_THRESHOLD" | bc -l) )); then
        echo "โš ๏ธ HIGH CPU: ${CPU_USAGE}%"
        echo "$(date): CPU Alert - ${CPU_USAGE}%" >> "$LOG_DIR/alerts.log"
    else
        echo "โœ… CPU: ${CPU_USAGE}%"
    fi
}

check_memory() {
    MEM_USAGE=$(free | grep Mem | awk '{print ($3/$2) * 100.0}')
    if (( $(echo "$MEM_USAGE > $MEM_THRESHOLD" | bc -l) )); then
        echo "โš ๏ธ HIGH MEMORY: ${MEM_USAGE}%"
        echo "$(date): Memory Alert - ${MEM_USAGE}%" >> "$LOG_DIR/alerts.log"
    else
        echo "โœ… Memory: ${MEM_USAGE}%"
    fi
}

check_disk() {
    df -h | tail -n +2 | while read line; do
        USAGE=$(echo $line | awk '{print $5}' | sed 's/%//')
        MOUNT=$(echo $line | awk '{print $6}')
        if [ "$USAGE" -gt "$DISK_THRESHOLD" ]; then
            echo "โš ๏ธ HIGH DISK: $MOUNT at ${USAGE}%"
            echo "$(date): Disk Alert - $MOUNT at ${USAGE}%" >> "$LOG_DIR/alerts.log"
        fi
    done
}

check_services() {
    SERVICES="nginx mysql sshd"
    for service in $SERVICES; do
        if ! systemctl is-active --quiet $service; then
            echo "โŒ Service Down: $service"
            echo "$(date): Service Alert - $service is down" >> "$LOG_DIR/alerts.log"
        else
            echo "โœ… Service Up: $service"
        fi
    done
}

# Main monitoring loop
echo "=== System Monitor Report - $(date) ==="
check_cpu
check_memory
check_disk
check_services

# Send alerts if log has new entries
if [ -f "$LOG_DIR/alerts.log" ]; then
    tail -n 10 "$LOG_DIR/alerts.log" | mail -s "System Alerts" "$ALERT_EMAIL"
fi
EOF

chmod +x ~/system-monitor.sh

# Schedule monitoring:
(crontab -l 2>/dev/null; echo "*/5 * * * * /home/$USER/system-monitor.sh") | crontab -

Perfect! ๐Ÿ† Enterprise monitoring deployed!

๐ŸŽฎ Quick Examples

Real-world monitoring scenarios! ๐ŸŽฏ

Example 1: Performance Troubleshooting Dashboard

#!/bin/bash
# Interactive performance troubleshooting dashboard

echo "Creating performance dashboard..."

# Create dashboard script
cat > ~/perf-dashboard.sh << 'EOF'
#!/bin/bash

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

# Functions
show_header() {
    clear
    echo "โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•"
    echo "   AlmaLinux Performance Dashboard - $(date +%H:%M:%S)"
    echo "โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•"
}

show_cpu() {
    echo -e "\n${GREEN}โ–ถ CPU Usage:${NC}"
    top -bn1 | head -5
    echo ""
    mpstat 1 1 | tail -2
}

show_memory() {
    echo -e "\n${GREEN}โ–ถ Memory Usage:${NC}"
    free -h
    echo ""
    echo "Top Memory Consumers:"
    ps aux --sort=-%mem | head -5 | awk '{printf "  %-10s %6s %s\n", $1, $4"%", $11}'
}

show_disk() {
    echo -e "\n${GREEN}โ–ถ Disk Usage:${NC}"
    df -h | grep -v tmpfs
    echo ""
    echo "Disk I/O:"
    iostat -x 1 2 | tail -n +4
}

show_network() {
    echo -e "\n${GREEN}โ–ถ Network Activity:${NC}"
    ip -s link | awk '/^[0-9]/{print $2} /RX:/{getline; print "  RX: "$1" bytes"} /TX:/{getline; print "  TX: "$1" bytes"}'
    echo ""
    echo "Active Connections:"
    ss -tun | tail -5
}

show_processes() {
    echo -e "\n${GREEN}โ–ถ Top Processes:${NC}"
    ps aux --sort=-%cpu | head -10 | awk '{printf "  %-10s %6s %6s %s\n", $1, $3"%", $4"%", $11}'
}

check_issues() {
    echo -e "\n${YELLOW}โ–ถ Potential Issues:${NC}"

    # Check CPU
    CPU_IDLE=$(top -bn1 | grep "Cpu(s)" | awk '{print $8}' | cut -d'%' -f1)
    if (( $(echo "$CPU_IDLE < 20" | bc -l) )); then
        echo -e "  ${RED}โš  High CPU usage detected${NC}"
    fi

    # Check Memory
    MEM_AVAIL=$(free | grep Mem | awk '{print $7/$2 * 100.0}')
    if (( $(echo "$MEM_AVAIL < 20" | bc -l) )); then
        echo -e "  ${RED}โš  Low memory available${NC}"
    fi

    # Check Swap
    SWAP_USED=$(free | grep Swap | awk '{if($2>0) print $3/$2 * 100.0; else print 0}')
    if (( $(echo "$SWAP_USED > 50" | bc -l) )); then
        echo -e "  ${RED}โš  High swap usage${NC}"
    fi

    # Check Load
    LOAD=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1)
    CORES=$(nproc)
    if (( $(echo "$LOAD > $CORES" | bc -l) )); then
        echo -e "  ${RED}โš  System load above CPU count${NC}"
    fi
}

# Main loop
while true; do
    show_header
    show_cpu
    show_memory
    show_disk
    show_network
    show_processes
    check_issues

    echo -e "\n${GREEN}Press [Enter] to refresh, [Q] to quit${NC}"
    read -t 5 -n 1 key
    if [[ $key = "q" ]] || [[ $key = "Q" ]]; then
        break
    fi
done
EOF

chmod +x ~/perf-dashboard.sh

echo "Dashboard created! Run with: ~/perf-dashboard.sh"

Example 2: Automated Alert System

#!/bin/bash
# Comprehensive alerting system

echo "Setting up automated alert system..."

# Create alert monitor
cat > ~/alert-monitor.sh << 'EOF'
#!/bin/bash

# Configuration
ALERT_LOG="/var/log/system-alerts.log"
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
EMAIL="[email protected]"

# Thresholds
CPU_CRITICAL=90
CPU_WARNING=70
MEM_CRITICAL=95
MEM_WARNING=85
DISK_CRITICAL=95
DISK_WARNING=85
LOAD_MULTIPLIER=2

# Functions
send_alert() {
    local severity=$1
    local component=$2
    local message=$3
    local value=$4

    # Log alert
    echo "$(date '+%Y-%m-%d %H:%M:%S') [$severity] $component: $message (Value: $value)" >> "$ALERT_LOG"

    # Send to Slack (if configured)
    if [ ! -z "$WEBHOOK_URL" ]; then
        curl -X POST -H 'Content-type: application/json' \
            --data "{\"text\":\"๐Ÿšจ [$severity] $component: $message (Value: $value)\"}" \
            "$WEBHOOK_URL" 2>/dev/null
    fi

    # Send email for critical alerts
    if [ "$severity" = "CRITICAL" ]; then
        echo "$message (Value: $value)" | mail -s "[$severity] System Alert: $component" "$EMAIL"
    fi
}

check_cpu() {
    local cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print 100 - $8}' | cut -d'%' -f1)

    if (( $(echo "$cpu_usage > $CPU_CRITICAL" | bc -l) )); then
        send_alert "CRITICAL" "CPU" "CPU usage critically high" "${cpu_usage}%"
    elif (( $(echo "$cpu_usage > $CPU_WARNING" | bc -l) )); then
        send_alert "WARNING" "CPU" "CPU usage high" "${cpu_usage}%"
    fi
}

check_memory() {
    local mem_usage=$(free | grep Mem | awk '{print ($3/$2) * 100.0}')

    if (( $(echo "$mem_usage > $MEM_CRITICAL" | bc -l) )); then
        send_alert "CRITICAL" "MEMORY" "Memory usage critically high" "${mem_usage}%"
    elif (( $(echo "$mem_usage > $MEM_WARNING" | bc -l) )); then
        send_alert "WARNING" "MEMORY" "Memory usage high" "${mem_usage}%"
    fi
}

check_disk() {
    df -h | tail -n +2 | while read line; do
        local usage=$(echo $line | awk '{print $5}' | sed 's/%//')
        local mount=$(echo $line | awk '{print $6}')

        if [ "$usage" -gt "$DISK_CRITICAL" ]; then
            send_alert "CRITICAL" "DISK" "Disk space critically low on $mount" "${usage}%"
        elif [ "$usage" -gt "$DISK_WARNING" ]; then
            send_alert "WARNING" "DISK" "Disk space low on $mount" "${usage}%"
        fi
    done
}

check_load() {
    local load=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1 | xargs)
    local cores=$(nproc)
    local threshold=$(echo "$cores * $LOAD_MULTIPLIER" | bc)

    if (( $(echo "$load > $threshold" | bc -l) )); then
        send_alert "WARNING" "LOAD" "System load high" "$load (${cores} cores)"
    fi
}

check_services() {
    local critical_services="sshd firewalld"
    local important_services="nginx mysql redis"

    for service in $critical_services; do
        if ! systemctl is-active --quiet $service; then
            send_alert "CRITICAL" "SERVICE" "$service is down" "stopped"
        fi
    done

    for service in $important_services; do
        if ! systemctl is-active --quiet $service 2>/dev/null; then
            send_alert "WARNING" "SERVICE" "$service is down" "stopped"
        fi
    done
}

check_network() {
    # Check network interfaces
    for interface in $(ip link | grep "^[0-9]" | cut -d: -f2 | grep -v lo); do
        if ! ip link show $interface | grep -q "state UP"; then
            send_alert "WARNING" "NETWORK" "Interface $interface is down" "DOWN"
        fi
    done

    # Check connectivity
    if ! ping -c 1 8.8.8.8 &>/dev/null; then
        send_alert "CRITICAL" "NETWORK" "No internet connectivity" "FAILED"
    fi
}

# Main monitoring
check_cpu
check_memory
check_disk
check_load
check_services
check_network

# Cleanup old alerts (keep 30 days)
find /var/log -name "system-alerts.log.*" -mtime +30 -delete
EOF

chmod +x ~/alert-monitor.sh

# Schedule monitoring
echo "*/5 * * * * /home/$USER/alert-monitor.sh" | crontab -

echo "Alert system configured!"

Example 3: Container Monitoring Setup

#!/bin/bash
# Docker/Podman container monitoring

echo "Setting up container monitoring..."

# Create container monitor
cat > ~/container-monitor.sh << 'EOF'
#!/bin/bash

# Detect container runtime
if command -v docker &>/dev/null; then
    RUNTIME="docker"
elif command -v podman &>/dev/null; then
    RUNTIME="podman"
else
    echo "No container runtime found"
    exit 1
fi

# Container monitoring dashboard
show_containers() {
    echo "=== Container Status ==="
    $RUNTIME ps --format "table {{.Names}}\t{{.Status}}\t{{.Size}}"
}

show_container_stats() {
    echo -e "\n=== Container Resource Usage ==="
    $RUNTIME stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
}

show_container_logs() {
    echo -e "\n=== Recent Container Logs ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        echo "โ–ถ $name:"
        $RUNTIME logs --tail 5 $container 2>&1 | sed 's/^/  /'
        echo ""
    done
}

check_container_health() {
    echo -e "\n=== Container Health Checks ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        health=$($RUNTIME inspect -f '{{.State.Health.Status}}' $container 2>/dev/null)

        if [ "$health" = "healthy" ]; then
            echo "โœ… $name: Healthy"
        elif [ "$health" = "unhealthy" ]; then
            echo "โŒ $name: Unhealthy"
        else
            echo "โš ๏ธ $name: No health check"
        fi
    done
}

monitor_container_resources() {
    echo -e "\n=== Detailed Resource Monitoring ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        echo "Container: $name"

        # Get cgroup stats
        if [ "$RUNTIME" = "docker" ]; then
            # CPU usage
            cpu_usage=$($RUNTIME exec $container cat /sys/fs/cgroup/cpuacct/cpuacct.usage 2>/dev/null)
            echo "  CPU nanoseconds: $cpu_usage"

            # Memory usage
            mem_usage=$($RUNTIME exec $container cat /sys/fs/cgroup/memory/memory.usage_in_bytes 2>/dev/null)
            echo "  Memory bytes: $mem_usage"
        fi

        # Process count
        proc_count=$($RUNTIME exec $container ps aux 2>/dev/null | wc -l)
        echo "  Processes: $proc_count"

        # Network connections
        net_conn=$($RUNTIME exec $container netstat -tan 2>/dev/null | grep ESTABLISHED | wc -l)
        echo "  Network connections: $net_conn"
        echo ""
    done
}

# Prometheus metrics exporter
export_metrics() {
    echo -e "\n=== Prometheus Metrics ==="
    for container in $($RUNTIME ps -q); do
        name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')
        stats=$($RUNTIME stats --no-stream --format "{{json .}}" $container)

        cpu=$(echo $stats | jq -r '.CPUPerc' | sed 's/%//')
        mem=$(echo $stats | jq -r '.MemPerc' | sed 's/%//')

        echo "container_cpu_usage_percent{name=\"$name\"} $cpu"
        echo "container_memory_usage_percent{name=\"$name\"} $mem"
    done
}

# Main monitoring loop
while true; do
    clear
    echo "Container Monitoring Dashboard - $(date)"
    echo "========================================"

    show_containers
    show_container_stats
    check_container_health
    monitor_container_resources

    if [ "$1" = "--export" ]; then
        export_metrics > /tmp/container_metrics.prom
    fi

    echo -e "\nPress Ctrl+C to exit"
    sleep 5
done
EOF

chmod +x ~/container-monitor.sh

# Create container alert script
cat > ~/container-alerts.sh << 'EOF'
#!/bin/bash

RUNTIME=$(command -v docker || command -v podman)
ALERT_LOG="/var/log/container-alerts.log"

# Check for stopped containers
for container in $($RUNTIME ps -a -q); do
    status=$($RUNTIME inspect -f '{{.State.Status}}' $container)
    name=$($RUNTIME inspect -f '{{.Name}}' $container | sed 's/^\/*//')

    if [ "$status" != "running" ]; then
        echo "$(date): Container $name is $status" >> "$ALERT_LOG"

        # Try to restart
        $RUNTIME start $container

        if [ $? -eq 0 ]; then
            echo "$(date): Successfully restarted $name" >> "$ALERT_LOG"
        else
            echo "$(date): Failed to restart $name" >> "$ALERT_LOG"
        fi
    fi
done

# Check container resource usage
$RUNTIME stats --no-stream --format "{{json .}}" | while read stats; do
    name=$(echo $stats | jq -r '.Name')
    cpu=$(echo $stats | jq -r '.CPUPerc' | sed 's/%//')
    mem=$(echo $stats | jq -r '.MemPerc' | sed 's/%//')

    if (( $(echo "$cpu > 80" | bc -l) )); then
        echo "$(date): High CPU usage in $name: ${cpu}%" >> "$ALERT_LOG"
    fi

    if (( $(echo "$mem > 80" | bc -l) )); then
        echo "$(date): High memory usage in $name: ${mem}%" >> "$ALERT_LOG"
    fi
done
EOF

chmod +x ~/container-alerts.sh

echo "Container monitoring configured!"
echo "Run dashboard: ~/container-monitor.sh"
echo "Schedule alerts: */5 * * * * ~/container-alerts.sh"

Example 4: Database Performance Monitoring

#!/bin/bash
# Database monitoring for MySQL/MariaDB and PostgreSQL

echo "Setting up database monitoring..."

# MySQL/MariaDB monitoring
cat > ~/mysql-monitor.sh << 'EOF'
#!/bin/bash

DB_USER="monitor"
DB_PASS="monitor_password"
ALERT_LOG="/var/log/mysql-monitor.log"

# Create monitoring user if not exists
mysql -u root -p << SQL
CREATE USER IF NOT EXISTS 'monitor'@'localhost' IDENTIFIED BY 'monitor_password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'monitor'@'localhost';
FLUSH PRIVILEGES;
SQL

# Monitor function
monitor_mysql() {
    echo "=== MySQL Monitoring Report - $(date) ==="

    # Connection statistics
    echo -e "\nโ–ถ Connection Statistics:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW STATUS WHERE Variable_name IN ('Threads_connected','Max_used_connections','Aborted_connects');" 2>/dev/null

    # Query performance
    echo -e "\nโ–ถ Query Performance:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW STATUS WHERE Variable_name IN ('Slow_queries','Questions','Queries');" 2>/dev/null

    # InnoDB statistics
    echo -e "\nโ–ถ InnoDB Buffer Pool:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW STATUS WHERE Variable_name LIKE 'Innodb_buffer_pool%';" 2>/dev/null | head -10

    # Current processes
    echo -e "\nโ–ถ Active Processes:"
    mysql -u $DB_USER -p$DB_PASS -e "SHOW PROCESSLIST;" 2>/dev/null

    # Table sizes
    echo -e "\nโ–ถ Largest Tables:"
    mysql -u $DB_USER -p$DB_PASS << SQL 2>/dev/null
    SELECT
        table_schema AS 'Database',
        table_name AS 'Table',
        ROUND((data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
    FROM information_schema.TABLES
    ORDER BY (data_length + index_length) DESC
    LIMIT 10;
SQL

    # Slow query log
    echo -e "\nโ–ถ Recent Slow Queries:"
    if [ -f /var/log/mysql/slow.log ]; then
        tail -10 /var/log/mysql/slow.log
    fi
}

# Performance metrics
check_performance() {
    # Check slow queries
    SLOW_QUERIES=$(mysql -u $DB_USER -p$DB_PASS -se "SHOW STATUS LIKE 'Slow_queries';" | awk '{print $2}')
    if [ "$SLOW_QUERIES" -gt 100 ]; then
        echo "$(date): WARNING - High number of slow queries: $SLOW_QUERIES" >> "$ALERT_LOG"
    fi

    # Check connections
    CURRENT_CONN=$(mysql -u $DB_USER -p$DB_PASS -se "SHOW STATUS LIKE 'Threads_connected';" | awk '{print $2}')
    MAX_CONN=$(mysql -u $DB_USER -p$DB_PASS -se "SHOW VARIABLES LIKE 'max_connections';" | awk '{print $2}')
    USAGE=$(echo "scale=2; $CURRENT_CONN / $MAX_CONN * 100" | bc)

    if (( $(echo "$USAGE > 80" | bc -l) )); then
        echo "$(date): WARNING - High connection usage: ${USAGE}%" >> "$ALERT_LOG"
    fi
}

# Run monitoring
monitor_mysql
check_performance
EOF

chmod +x ~/mysql-monitor.sh

# PostgreSQL monitoring
cat > ~/postgres-monitor.sh << 'EOF'
#!/bin/bash

export PGUSER="postgres"
ALERT_LOG="/var/log/postgres-monitor.log"

monitor_postgres() {
    echo "=== PostgreSQL Monitoring Report - $(date) ==="

    # Connection statistics
    echo -e "\nโ–ถ Connection Statistics:"
    psql -c "SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;"

    # Database sizes
    echo -e "\nโ–ถ Database Sizes:"
    psql -c "SELECT datname, pg_size_pretty(pg_database_size(datname)) as size FROM pg_database ORDER BY pg_database_size(datname) DESC;"

    # Long running queries
    echo -e "\nโ–ถ Long Running Queries:"
    psql << SQL
    SELECT pid, now() - pg_stat_activity.query_start AS duration, query
    FROM pg_stat_activity
    WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
    AND state != 'idle';
SQL

    # Cache hit ratio
    echo -e "\nโ–ถ Cache Hit Ratio:"
    psql << SQL
    SELECT
        sum(heap_blks_read) as heap_read,
        sum(heap_blks_hit) as heap_hit,
        sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
    FROM pg_statio_user_tables;
SQL

    # Table bloat
    echo -e "\nโ–ถ Table Bloat:"
    psql << SQL
    SELECT
        schemaname,
        tablename,
        pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
        n_dead_tup
    FROM pg_stat_user_tables
    WHERE n_dead_tup > 1000
    ORDER BY n_dead_tup DESC
    LIMIT 10;
SQL
}

# Run monitoring
monitor_postgres
EOF

chmod +x ~/postgres-monitor.sh

echo "Database monitoring scripts created!"
echo "MySQL: ~/mysql-monitor.sh"
echo "PostgreSQL: ~/postgres-monitor.sh"

๐Ÿšจ Fix Common Problems

Monitoring troubleshooting guide! ๐Ÿ”ง

Problem 1: High CPU Usage

Solution:

# Identify CPU consumers:
# 1. Find top CPU processes
top -bn1 | head -20
ps aux --sort=-%cpu | head -10

# 2. Check for runaway processes
ps aux | awk '$3 > 50 {print $0}'

# 3. Analyze specific process
PID=12345  # Replace with actual PID
strace -p $PID -c  # Count system calls
lsof -p $PID       # Open files
pmap $PID          # Memory map

# 4. Check for CPU intensive services
systemd-cgtop

# 5. Temporary fixes
# Limit CPU usage with nice/renice:
renice +10 -p $PID   # Lower priority
# Or use cpulimit:
sudo dnf install cpulimit
cpulimit -p $PID -l 50  # Limit to 50%

# 6. Investigate cause
journalctl -u service_name --since "1 hour ago"
dmesg | tail -50

# 7. Permanent solutions
# Update software
sudo dnf update
# Optimize configuration
# Add more CPU cores

Problem 2: Memory Leaks

Solution:

# Diagnose memory issues:
# 1. Find memory hogs
ps aux --sort=-%mem | head -10
smem -r -k | head -10  # Install: sudo dnf install smem

# 2. Track memory growth
while true; do
    ps aux | grep process_name
    sleep 60
done

# 3. Analyze memory usage
cat /proc/$PID/status | grep -E "VmSize|VmRSS|VmSwap"
pmap -x $PID | tail -1

# 4. Memory leak detection
# Install valgrind
sudo dnf install valgrind
valgrind --leak-check=full ./program

# 5. Clear caches
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

# 6. Configure swap
# Check swap usage
free -h
swapon --show

# Add swap file if needed
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# 7. Set memory limits
# Edit /etc/security/limits.conf
username soft memlock 1048576
username hard memlock 1048576

Problem 3: Disk I/O Bottlenecks

Solution:

# Identify I/O issues:
# 1. Check I/O statistics
iostat -x 2
iotop -o

# 2. Find heavy I/O processes
pidstat -d 2

# 3. Check disk health
sudo smartctl -H /dev/sda
sudo smartctl -a /dev/sda | grep -E "Reallocated|Pending|Uncorrectable"

# 4. Analyze specific process I/O
cat /proc/$PID/io

# 5. Optimize I/O
# Change I/O scheduler
echo noop | sudo tee /sys/block/sda/queue/scheduler
# Options: noop, deadline, cfq, bfq

# 6. Mount options optimization
# Edit /etc/fstab
# Add: noatime,nodiratime

# 7. Move high I/O to different disk
# Identify heavy directories
du -sh /* | sort -hr | head -10

# Move to SSD or separate disk

Problem 4: Network Performance Issues

Solution:

# Diagnose network problems:
# 1. Check interface statistics
ip -s link show
netstat -i

# 2. Monitor bandwidth
iftop -i eth0
nethogs

# 3. Check for packet loss
ping -c 100 google.com | grep loss
mtr google.com

# 4. TCP tuning
# Check current settings
sysctl net.ipv4.tcp_congestion_control
sysctl net.core.rmem_max

# Optimize settings
cat | sudo tee /etc/sysctl.d/99-network.conf << EOF
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
EOF

sudo sysctl -p /etc/sysctl.d/99-network.conf

# 5. DNS issues
# Test DNS performance
dig google.com | grep "Query time"

# Use faster DNS
echo "nameserver 1.1.1.1" | sudo tee /etc/resolv.conf

# 6. Check for network errors
ethtool -S eth0 | grep -E "errors|drops"

๐Ÿ“‹ Monitoring Tools Quick Reference

ToolPurposeCommand
topProcess monitortop
htopEnhanced process monitorhtop
freeMemory usagefree -h
dfDisk usagedf -h
iostatI/O statisticsiostat -x 2
iftopNetwork bandwidthsudo iftop -i eth0
glancesSystem overviewglances
nmonPerformance monitornmon
sarSystem activitysar -u 2
vmstatVirtual memoryvmstat 2

๐Ÿ’ก Tips for Success

Master system monitoring like a professional! ๐ŸŒŸ

  • ๐Ÿ“Š Baseline Normal: Know what normal looks like
  • ๐Ÿ”„ Regular Monitoring: Check systems daily
  • ๐Ÿ“ Document Patterns: Record recurring issues
  • ๐ŸŽฏ Set Thresholds: Define alert trigger points
  • ๐Ÿ“ˆ Track Trends: Monitor long-term patterns
  • ๐Ÿ› ๏ธ Automate Alerts: Donโ€™t rely on manual checks
  • ๐Ÿ’พ Store Metrics: Keep historical data
  • ๐Ÿ” Correlate Events: Look for related issues
  • ๐Ÿ“ฑ Mobile Access: Monitor from anywhere
  • ๐Ÿค Share Dashboards: Keep team informed

๐Ÿ† What You Learned

Congratulations! Youโ€™re now a monitoring expert! ๐ŸŽ‰

  • โœ… Mastered essential command-line monitoring tools
  • โœ… Deployed advanced monitoring utilities
  • โœ… Configured network and service monitoring
  • โœ… Implemented enterprise monitoring solutions
  • โœ… Built custom monitoring scripts and dashboards
  • โœ… Created automated alerting systems
  • โœ… Solved common performance problems
  • โœ… Gained professional system observability skills

๐ŸŽฏ Why This Matters

Your monitoring expertise ensures system reliability! ๐Ÿš€

  • ๐Ÿšจ Proactive Management: Fix issues before they escalate
  • ๐Ÿ’ฐ Cost Savings: Optimize resource usage
  • ๐Ÿ“ˆ Performance: Maintain peak system efficiency
  • ๐Ÿ›ก๏ธ Security: Detect anomalies and threats
  • ๐Ÿ’ผ Professional Value: Essential operations skill
  • ๐ŸŽฏ Rapid Response: Quickly identify root causes
  • ๐Ÿ“Š Data-Driven: Make informed decisions
  • ๐ŸŒŸ System Health: Ensure continuous availability

You now have complete visibility into your Linux systems! ๐Ÿ†

Monitor everything, miss nothing! ๐Ÿ™Œ