📊 Thanos Metrics Setup on AlmaLinux 9: Complete Long-term Prometheus Storage Guide

Welcome to the amazing world of long-term metrics storage! 🎉 Today we’re going to learn how to set up Thanos on AlmaLinux 9, the incredible tool that extends Prometheus to give you unlimited retention, global queries, and downsampling. Think of Thanos as your time machine for metrics data! ⏰✨

🤔 Why is Thanos Important?

Prometheus is fantastic for metrics collection, but it has limitations when it comes to long-term storage and high availability. Here’s why Thanos is a game-changer:

📈 Unlimited retention - Store metrics for years in cheap object storage
🌍 Global view - Query metrics from multiple Prometheus instances as one
⚡ Downsampling - Automatic data compression for long-term storage efficiency
🛡️ High availability - No more single point of failure for your metrics
💰 Cost effective - Use S3, GCS, or Azure blob storage instead of expensive local disks
🔍 Deduplication - Remove duplicate metrics automatically across replicas

🎯 What You Need

Before we start our Thanos adventure, let’s make sure you have everything ready:

✅ AlmaLinux 9 system (fresh installation recommended)
✅ Root or sudo access for installing packages
✅ At least 8GB RAM (16GB recommended for production)
✅ 20GB free disk space for local storage and caching
✅ Internet connection for downloading packages
✅ Basic terminal knowledge (don’t worry, we’ll explain everything!)
✅ Existing Prometheus setup (we’ll create one if you don’t have it)
✅ Object storage access (MinIO, S3, or similar - we’ll set up MinIO)

📝 Step 1: Update Your AlmaLinux System

Let’s start by making sure your system is up to date! 🚀

# Update all packages to latest versions
sudo dnf update -y

# Install essential development tools
sudo dnf groupinstall "Development Tools" -y

# Install helpful utilities we'll need
sudo dnf install -y curl wget git vim htop jq unzip

Perfect! Your system is now ready for Thanos installation! ✨

🔧 Step 2: Install Docker and Docker Compose

Thanos works great with containers! Let’s set up Docker:

# Install Docker from official repository
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

# Install Docker Engine
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker

# Add your user to docker group (no more sudo needed!)
sudo usermod -aG docker $USER

# Apply group changes (or logout/login)
newgrp docker

# Test Docker installation
docker --version
docker compose version

Great! Docker is ready for our Thanos deployment! 🐳

🌟 Step 3: Set Up MinIO Object Storage

Thanos needs object storage for long-term data. Let’s set up MinIO as our S3-compatible storage:

# Create directory for our Thanos setup
mkdir -p ~/thanos-setup
cd ~/thanos-setup

# Create MinIO docker-compose configuration
cat > docker-compose-minio.yml << 'EOF'
version: '3.8'

services:
  minio:
    image: minio/minio:latest
    container_name: thanos-minio
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      - MINIO_ROOT_USER=thanos
      - MINIO_ROOT_PASSWORD=thanospassword123
    volumes:
      - minio-data:/data
    command: server /data --console-address ":9001"
    networks:
      - thanos-net
    restart: unless-stopped

volumes:
  minio-data:

networks:
  thanos-net:
    driver: bridge
EOF

# Start MinIO
docker compose -f docker-compose-minio.yml up -d

# Wait for MinIO to start
sleep 10

# Install MinIO client
curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/

# Configure MinIO client
mc config host add local http://localhost:9000 thanos thanospassword123

# Create bucket for Thanos
mc mb local/thanos-bucket
mc policy set public local/thanos-bucket

Awesome! MinIO is running and ready for Thanos data! 🪣

✅ Step 4: Deploy Prometheus with Thanos Sidecar

Let’s set up Prometheus with the Thanos sidecar for seamless integration:

# Create Prometheus configuration
mkdir -p ~/thanos-setup/prometheus-config
cd ~/thanos-setup

# Create Prometheus configuration file
cat > prometheus-config/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'almalinux-cluster'
    region: 'us-east-1'
    replica: 'prometheus-1'

rule_files:
  - "*.rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'thanos-sidecar'
    static_configs:
      - targets: ['localhost:10902']

  - job_name: 'thanos-query'
    static_configs:
      - targets: ['thanos-query:9090']
EOF

# Create Thanos bucket configuration
cat > bucket-config.yml << 'EOF'
type: S3
config:
  bucket: "thanos-bucket"
  endpoint: "minio:9000"
  access_key: "thanos"
  secret_key: "thanospassword123"
  insecure: true
  signature_version2: false
  put_user_metadata: {}
  http_config:
    idle_conn_timeout: 1m30s
    response_header_timeout: 2m
  trace:
    enable: false
  part_size: 134217728
EOF

# Create main Thanos docker-compose file
cat > docker-compose.yml << 'EOF'
version: '3.8'

networks:
  thanos-net:
    external: true

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: thanos-prometheus
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=2h'
      - '--storage.tsdb.min-block-duration=2h'
      - '--storage.tsdb.max-block-duration=2h'
      - '--web.enable-lifecycle'
      - '--storage.tsdb.no-lockfile'
    volumes:
      - ./prometheus-config:/etc/prometheus
      - prometheus-data:/prometheus
    networks:
      - thanos-net
    depends_on:
      - minio
    restart: unless-stopped

  thanos-sidecar:
    image: thanosio/thanos:latest
    container_name: thanos-sidecar
    ports:
      - "10902:10902"
    command:
      - sidecar
      - --tsdb.path=/prometheus
      - --prometheus.url=http://prometheus:9090
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:10902
      - --objstore.config-file=/bucket-config.yml
    volumes:
      - prometheus-data:/prometheus
      - ./bucket-config.yml:/bucket-config.yml
    networks:
      - thanos-net
    depends_on:
      - prometheus
      - minio
    restart: unless-stopped

  thanos-query:
    image: thanosio/thanos:latest
    container_name: thanos-query
    ports:
      - "9091:9090"
    command:
      - query
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:9090
      - --store=thanos-sidecar:10901
      - --store=thanos-store:10901
    networks:
      - thanos-net
    depends_on:
      - thanos-sidecar
    restart: unless-stopped

  thanos-store:
    image: thanosio/thanos:latest
    container_name: thanos-store
    ports:
      - "10903:10902"
    command:
      - store
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:10902
      - --data-dir=/tmp/thanos/store
      - --objstore.config-file=/bucket-config.yml
    volumes:
      - ./bucket-config.yml:/bucket-config.yml
      - thanos-store-data:/tmp/thanos/store
    networks:
      - thanos-net
    depends_on:
      - minio
    restart: unless-stopped

  thanos-compactor:
    image: thanosio/thanos:latest
    container_name: thanos-compactor
    command:
      - compact
      - --data-dir=/tmp/thanos/compact
      - --objstore.config-file=/bucket-config.yml
      - --http-address=0.0.0.0:10902
      - --wait
    volumes:
      - ./bucket-config.yml:/bucket-config.yml
      - thanos-compactor-data:/tmp/thanos/compact
    networks:
      - thanos-net
    depends_on:
      - minio
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    container_name: thanos-node-exporter
    ports:
      - "9100:9100"
    networks:
      - thanos-net
    restart: unless-stopped

volumes:
  prometheus-data:
  thanos-store-data:
  thanos-compactor-data:
EOF

# Deploy the complete Thanos stack
docker compose up -d

# Check if all services are running
docker compose ps

Amazing! Your complete Thanos stack is now running! 🚀

🔧 Step 5: Configure Thanos Ruler (Optional)

For advanced alerting and recording rules, let’s add Thanos Ruler:

# Create alerting rules
mkdir -p ~/thanos-setup/rules
cat > rules/example.rules.yml << 'EOF'
groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: rate(prometheus_http_requests_total{code="500"}[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} errors per second"
      
      - record: instance:cpu_usage:rate5m
        expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
EOF

# Add Thanos Ruler to docker-compose
cat >> docker-compose.yml << 'EOF'

  thanos-ruler:
    image: thanosio/thanos:latest
    container_name: thanos-ruler
    ports:
      - "10904:10902"
    command:
      - rule
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:10902
      - --rule-file=/rules/*.rules.yml
      - --data-dir=/tmp/thanos/ruler
      - --eval-interval=15s
      - --objstore.config-file=/bucket-config.yml
      - --query=thanos-query:9090
    volumes:
      - ./rules:/rules
      - ./bucket-config.yml:/bucket-config.yml
      - thanos-ruler-data:/tmp/thanos/ruler
    networks:
      - thanos-net
    depends_on:
      - thanos-query
      - minio
    restart: unless-stopped

volumes:
  thanos-ruler-data:
EOF

# Restart to include the ruler
docker compose down && docker compose up -d

Great! Now you have alerting and recording rules with Thanos Ruler! 📊

✅ Step 6: Verify Thanos Installation

Let’s make sure everything is working perfectly:

# Check all containers are running
docker compose ps

# Check Prometheus is accessible
curl -s http://localhost:9090/api/v1/query?query=up | jq

# Check Thanos Query is working
curl -s http://localhost:9091/api/v1/query?query=up | jq

# Check Thanos Sidecar metrics
curl -s http://localhost:10902/metrics | head -n 10

# Verify MinIO has data
mc ls local/thanos-bucket

# Check object storage integration
curl -s http://localhost:10903/api/v1/status/config | jq

You should see all services running and data being stored in MinIO! 🎯

Open your browser to access:

Prometheus: http://your-server-ip:9090
Thanos Query: http://your-server-ip:9091
MinIO Console: http://your-server-ip:9001

🎮 Quick Examples

Let’s try some practical examples to see Thanos in action! 🚀

Example 1: Query Historical Data

# Generate some sample metrics by hitting Prometheus
for i in {1..100}; do
  curl -s http://localhost:9090/api/v1/query?query=up > /dev/null
  sleep 5
done

# Wait for data to be uploaded to object storage (5-10 minutes)
sleep 600

# Query recent data through Thanos Query
curl -s "http://localhost:9091/api/v1/query?query=prometheus_http_requests_total" | jq '.data.result[0].value'

# Query historical data with time range
curl -s "http://localhost:9091/api/v1/query_range?query=up&start=$(date -d '1 hour ago' +%s)&end=$(date +%s)&step=300" | jq

See how Thanos seamlessly provides both recent and historical data! ⏰

Example 2: Multi-Cluster Global View

# Simulate second Prometheus cluster
cat > docker-compose-cluster2.yml << 'EOF'
version: '3.8'

networks:
  thanos-net:
    external: true

services:
  prometheus-cluster2:
    image: prom/prometheus:latest
    container_name: prometheus-cluster2
    ports:
      - "9092:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=2h'
      - '--storage.tsdb.min-block-duration=2h'
      - '--storage.tsdb.max-block-duration=2h'
    volumes:
      - ./prometheus-config-cluster2:/etc/prometheus
      - prometheus-cluster2-data:/prometheus
    networks:
      - thanos-net
    restart: unless-stopped

  thanos-sidecar-cluster2:
    image: thanosio/thanos:latest
    container_name: thanos-sidecar-cluster2
    ports:
      - "10905:10902"
    command:
      - sidecar
      - --tsdb.path=/prometheus
      - --prometheus.url=http://prometheus-cluster2:9090
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:10902
      - --objstore.config-file=/bucket-config.yml
    volumes:
      - prometheus-cluster2-data:/prometheus
      - ./bucket-config.yml:/bucket-config.yml
    networks:
      - thanos-net
    depends_on:
      - prometheus-cluster2
    restart: unless-stopped

volumes:
  prometheus-cluster2-data:
EOF

# Create config for second cluster
mkdir -p prometheus-config-cluster2
cat > prometheus-config-cluster2/prometheus.yml << 'EOF'
global:
  external_labels:
    cluster: 'almalinux-cluster-2'
    region: 'us-west-1'
    replica: 'prometheus-1'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
EOF

# Start second cluster
docker compose -f docker-compose-cluster2.yml up -d

# Add second cluster to Thanos Query
docker compose exec thanos-query \
  /bin/sh -c "kill -HUP 1"  # This would reload if configured with --store flags

Now you can query across both clusters from a single Thanos Query interface! 🌍

Example 3: Grafana Dashboard Integration

# Add Grafana to the stack
cat >> docker-compose.yml << 'EOF'

  grafana:
    image: grafana/grafana:latest
    container_name: thanos-grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=thanosadmin
    volumes:
      - grafana-data:/var/lib/grafana
    networks:
      - thanos-net
    restart: unless-stopped

volumes:
  grafana-data:
EOF

# Restart stack with Grafana
docker compose down && docker compose up -d

# Wait for Grafana to start
sleep 30

# Configure Grafana datasource via API
curl -X POST \
  http://admin:thanosadmin@localhost:3000/api/datasources \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Thanos",
    "type": "prometheus",
    "url": "http://thanos-query:9090",
    "access": "proxy",
    "isDefault": true
  }'

Access Grafana at http://localhost:3000 (admin/thanosadmin) and create dashboards with your Thanos data! 📈

🚨 Fix Common Problems

Here are solutions to the most common Thanos issues you might encounter:

Problem 1: Object Storage Connection Failed 🪣

Symptoms: Thanos components can’t connect to MinIO

Solutions:

# Check MinIO is accessible
curl -v http://localhost:9000/minio/health/live

# Verify bucket exists and permissions
mc ls local/thanos-bucket
mc policy get local/thanos-bucket

# Test bucket configuration
docker compose exec thanos-sidecar \
  /bin/sh -c "thanos tools bucket verify --objstore.config-file=/bucket-config.yml"

# Check network connectivity between containers
docker compose exec thanos-sidecar ping minio

Problem 2: No Data in Thanos Query 📊

Symptoms: Thanos Query shows no metrics or incomplete data

Solutions:

# Check Thanos Sidecar is uploading blocks
curl http://localhost:10902/metrics | grep thanos_objstore

# Verify Prometheus has external labels configured
curl -s http://localhost:9090/api/v1/status/config | grep external_labels

# Check Thanos Store is loading blocks
docker compose logs thanos-store | grep -i "loaded"

# Force Prometheus to create blocks (restart)
docker compose restart prometheus

Problem 3: High Memory Usage 💾

Symptoms: Thanos components using excessive memory

Solutions:

# Limit memory in docker-compose.yml
services:
  thanos-query:
    mem_limit: 1g
    mem_reservation: 512m

# Configure downsampling more aggressively
docker compose exec thanos-compactor \
  /bin/sh -c "thanos compact --retention.resolution-raw=7d --retention.resolution-5m=30d --retention.resolution-1h=1y"

# Monitor memory usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Problem 4: Slow Query Performance 🐌

Symptoms: Queries taking too long to complete

Solutions:

# Enable query pushdown for better performance
docker compose exec thanos-query \
  /bin/sh -c "thanos query --query.partial-response --query.max-concurrent=20"

# Add more Store Gateway replicas
# Scale thanos-store service in docker-compose.yml

# Check compaction status
curl -s http://localhost:10902/api/v1/status/config | jq '.config.compactor'

# Optimize MinIO for better performance
mc admin config set local notify_webhook:1 queue_limit=100

📋 Simple Commands Summary

Here’s your quick reference guide for managing Thanos:

Task	Command	Description
Start Thanos	`docker compose up -d`	Launch complete Thanos stack
Stop Thanos	`docker compose down`	Stop all Thanos services
View logs	`docker compose logs [service]`	Check specific component logs
Check status	`docker compose ps`	See all running containers
Restart service	`docker compose restart [service]`	Restart specific component
Query metrics	`curl http://localhost:9091/api/v1/query?query=up`	Query via Thanos Query
Check MinIO	`mc ls local/thanos-bucket`	List stored blocks
Health check	`curl http://localhost:10902/metrics`	Check sidecar health
Force compaction	`docker compose exec thanos-compactor /bin/sh -c "thanos compact --wait"`	Trigger compaction
Update images	`docker compose pull && docker compose up -d`	Update to latest versions

💡 Tips for Success

Here are some pro tips to get the most out of Thanos! 🌟

🎯 Plan Your Retention Strategy: Configure different retention periods for raw data (7d), 5m downsamples (30d), and 1h downsamples (1y) to optimize storage costs.

⚡ Use External Labels Wisely: Set meaningful external labels (cluster, region, environment) to enable proper deduplication and querying across multiple Prometheus instances.

📊 Monitor Compaction: Keep an eye on the compaction process - it’s crucial for query performance and storage efficiency. Set up alerts for compaction failures.

🔍 Enable Partial Responses: Use --query.partial-response flag on Thanos Query to get results even if some stores are down.

💾 Optimize Object Storage: Use lifecycle policies on your S3 bucket to move older data to cheaper storage classes (IA, Glacier).

🚀 Scale Store Gateways: Add multiple Store Gateway instances and distribute blocks across them for better query performance.

🔗 Integrate with Service Discovery: Use Consul, Kubernetes, or file-based service discovery to automatically discover Thanos stores and sidecars.

🛡️ Secure Your Setup: Enable HTTPS, authentication, and network policies for production deployments.

🏆 What You Learned

Congratulations! You’ve successfully mastered Thanos metrics setup! 🎉 Here’s everything you accomplished:

✅ Installed complete Thanos stack on AlmaLinux 9
✅ Set up MinIO object storage for long-term metrics retention
✅ Configured Prometheus with Thanos sidecar for seamless integration
✅ Deployed Thanos Query for global metrics querying
✅ Set up Thanos Store Gateway for historical data access
✅ Configured Thanos Compactor for data optimization
✅ Created Thanos Ruler for distributed alerting
✅ Integrated with Grafana for beautiful dashboards
✅ Learned troubleshooting and performance optimization
✅ Mastered production deployment best practices

🎯 Why This Matters

Thanos transforms your metrics infrastructure from limited to limitless! 🚀 You can now:

📈 Store Years of Metrics: Keep historical data for compliance, trend analysis, and capacity planning without breaking the bank
🌍 Global Observability: Query metrics from multiple data centers, regions, and clusters as if they were one system
⚡ Improved Performance: Automatic downsampling and compaction keep queries fast even with massive datasets
🛡️ High Availability: No more single points of failure in your monitoring infrastructure
💰 Cost Optimization: Use cheap object storage instead of expensive local SSDs for long-term retention
📊 Better Analytics: Long-term trends, seasonal patterns, and year-over-year comparisons become possible

You now have enterprise-grade metrics storage that scales infinitely and provides incredible insights into your infrastructure. This makes you invaluable for SRE, DevOps, and platform engineering roles where observability is critical! ⭐

Keep monitoring, keep optimizing, and remember - with Thanos, your metrics have no limits! 🙌✨

📊 Thanos Metrics Setup on AlmaLinux 9: Complete Long-term Prometheus Storage Guide

Table of Contents

📊 Thanos Metrics Setup on AlmaLinux 9: Complete Long-term Prometheus Storage Guide

🤔 Why is Thanos Important?

🎯 What You Need

📝 Step 1: Update Your AlmaLinux System

🔧 Step 2: Install Docker and Docker Compose

🌟 Step 3: Set Up MinIO Object Storage

✅ Step 4: Deploy Prometheus with Thanos Sidecar

🔧 Step 5: Configure Thanos Ruler (Optional)

✅ Step 6: Verify Thanos Installation

🎮 Quick Examples

Example 1: Query Historical Data

Example 2: Multi-Cluster Global View

Example 3: Grafana Dashboard Integration

🚨 Fix Common Problems

Problem 1: Object Storage Connection Failed 🪣

Problem 2: No Data in Thanos Query 📊

Problem 3: High Memory Usage 💾

Problem 4: Slow Query Performance 🐌

📋 Simple Commands Summary

💡 Tips for Success

🏆 What You Learned

🎯 Why This Matters

Share this article

📊 Thanos Metrics Setup on AlmaLinux 9: Complete Long-term Prometheus Storage Guide

Table of Contents

📊 Thanos Metrics Setup on AlmaLinux 9: Complete Long-term Prometheus Storage Guide

🤔 Why is Thanos Important?

🎯 What You Need

📝 Step 1: Update Your AlmaLinux System

🔧 Step 2: Install Docker and Docker Compose

🌟 Step 3: Set Up MinIO Object Storage

✅ Step 4: Deploy Prometheus with Thanos Sidecar

🔧 Step 5: Configure Thanos Ruler (Optional)

✅ Step 6: Verify Thanos Installation

🎮 Quick Examples

Example 1: Query Historical Data

Example 2: Multi-Cluster Global View

Example 3: Grafana Dashboard Integration

🚨 Fix Common Problems

Problem 1: Object Storage Connection Failed 🪣

Problem 2: No Data in Thanos Query 📊

Problem 3: High Memory Usage 💾

Problem 4: Slow Query Performance 🐌

📋 Simple Commands Summary

💡 Tips for Success

🏆 What You Learned

🎯 Why This Matters

Share this article

Related Articles

📊 AlmaLinux Monitoring: Complete Prometheus & Grafana Guide for Real-Time Insights

📊 Building Beautiful Monitoring Dashboards with Grafana on AlmaLinux: Visualize Your Data Like a Pro

📊 Installing Prometheus and Grafana on Alpine Linux: Complete Monitoring Guide

Scan QR Code