๐ Thanos Metrics Setup on AlmaLinux 9: Complete Long-term Prometheus Storage Guide
Welcome to the amazing world of long-term metrics storage! ๐ Today weโre going to learn how to set up Thanos on AlmaLinux 9, the incredible tool that extends Prometheus to give you unlimited retention, global queries, and downsampling. Think of Thanos as your time machine for metrics data! โฐโจ
๐ค Why is Thanos Important?
Prometheus is fantastic for metrics collection, but it has limitations when it comes to long-term storage and high availability. Hereโs why Thanos is a game-changer:
- ๐ Unlimited retention - Store metrics for years in cheap object storage
- ๐ Global view - Query metrics from multiple Prometheus instances as one
- โก Downsampling - Automatic data compression for long-term storage efficiency
- ๐ก๏ธ High availability - No more single point of failure for your metrics
- ๐ฐ Cost effective - Use S3, GCS, or Azure blob storage instead of expensive local disks
- ๐ Deduplication - Remove duplicate metrics automatically across replicas
๐ฏ What You Need
Before we start our Thanos adventure, letโs make sure you have everything ready:
โ
AlmaLinux 9 system (fresh installation recommended)
โ
Root or sudo access for installing packages
โ
At least 8GB RAM (16GB recommended for production)
โ
20GB free disk space for local storage and caching
โ
Internet connection for downloading packages
โ
Basic terminal knowledge (donโt worry, weโll explain everything!)
โ
Existing Prometheus setup (weโll create one if you donโt have it)
โ
Object storage access (MinIO, S3, or similar - weโll set up MinIO)
๐ Step 1: Update Your AlmaLinux System
Letโs start by making sure your system is up to date! ๐
# Update all packages to latest versions
sudo dnf update -y
# Install essential development tools
sudo dnf groupinstall "Development Tools" -y
# Install helpful utilities we'll need
sudo dnf install -y curl wget git vim htop jq unzip
Perfect! Your system is now ready for Thanos installation! โจ
๐ง Step 2: Install Docker and Docker Compose
Thanos works great with containers! Letโs set up Docker:
# Install Docker from official repository
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# Install Docker Engine
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker
# Add your user to docker group (no more sudo needed!)
sudo usermod -aG docker $USER
# Apply group changes (or logout/login)
newgrp docker
# Test Docker installation
docker --version
docker compose version
Great! Docker is ready for our Thanos deployment! ๐ณ
๐ Step 3: Set Up MinIO Object Storage
Thanos needs object storage for long-term data. Letโs set up MinIO as our S3-compatible storage:
# Create directory for our Thanos setup
mkdir -p ~/thanos-setup
cd ~/thanos-setup
# Create MinIO docker-compose configuration
cat > docker-compose-minio.yml << 'EOF'
version: '3.8'
services:
minio:
image: minio/minio:latest
container_name: thanos-minio
ports:
- "9000:9000"
- "9001:9001"
environment:
- MINIO_ROOT_USER=thanos
- MINIO_ROOT_PASSWORD=thanospassword123
volumes:
- minio-data:/data
command: server /data --console-address ":9001"
networks:
- thanos-net
restart: unless-stopped
volumes:
minio-data:
networks:
thanos-net:
driver: bridge
EOF
# Start MinIO
docker compose -f docker-compose-minio.yml up -d
# Wait for MinIO to start
sleep 10
# Install MinIO client
curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/
# Configure MinIO client
mc config host add local http://localhost:9000 thanos thanospassword123
# Create bucket for Thanos
mc mb local/thanos-bucket
mc policy set public local/thanos-bucket
Awesome! MinIO is running and ready for Thanos data! ๐ชฃ
โ Step 4: Deploy Prometheus with Thanos Sidecar
Letโs set up Prometheus with the Thanos sidecar for seamless integration:
# Create Prometheus configuration
mkdir -p ~/thanos-setup/prometheus-config
cd ~/thanos-setup
# Create Prometheus configuration file
cat > prometheus-config/prometheus.yml << 'EOF'
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'almalinux-cluster'
region: 'us-east-1'
replica: 'prometheus-1'
rule_files:
- "*.rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'thanos-sidecar'
static_configs:
- targets: ['localhost:10902']
- job_name: 'thanos-query'
static_configs:
- targets: ['thanos-query:9090']
EOF
# Create Thanos bucket configuration
cat > bucket-config.yml << 'EOF'
type: S3
config:
bucket: "thanos-bucket"
endpoint: "minio:9000"
access_key: "thanos"
secret_key: "thanospassword123"
insecure: true
signature_version2: false
put_user_metadata: {}
http_config:
idle_conn_timeout: 1m30s
response_header_timeout: 2m
trace:
enable: false
part_size: 134217728
EOF
# Create main Thanos docker-compose file
cat > docker-compose.yml << 'EOF'
version: '3.8'
networks:
thanos-net:
external: true
services:
prometheus:
image: prom/prometheus:latest
container_name: thanos-prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=2h'
- '--storage.tsdb.min-block-duration=2h'
- '--storage.tsdb.max-block-duration=2h'
- '--web.enable-lifecycle'
- '--storage.tsdb.no-lockfile'
volumes:
- ./prometheus-config:/etc/prometheus
- prometheus-data:/prometheus
networks:
- thanos-net
depends_on:
- minio
restart: unless-stopped
thanos-sidecar:
image: thanosio/thanos:latest
container_name: thanos-sidecar
ports:
- "10902:10902"
command:
- sidecar
- --tsdb.path=/prometheus
- --prometheus.url=http://prometheus:9090
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config-file=/bucket-config.yml
volumes:
- prometheus-data:/prometheus
- ./bucket-config.yml:/bucket-config.yml
networks:
- thanos-net
depends_on:
- prometheus
- minio
restart: unless-stopped
thanos-query:
image: thanosio/thanos:latest
container_name: thanos-query
ports:
- "9091:9090"
command:
- query
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:9090
- --store=thanos-sidecar:10901
- --store=thanos-store:10901
networks:
- thanos-net
depends_on:
- thanos-sidecar
restart: unless-stopped
thanos-store:
image: thanosio/thanos:latest
container_name: thanos-store
ports:
- "10903:10902"
command:
- store
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --data-dir=/tmp/thanos/store
- --objstore.config-file=/bucket-config.yml
volumes:
- ./bucket-config.yml:/bucket-config.yml
- thanos-store-data:/tmp/thanos/store
networks:
- thanos-net
depends_on:
- minio
restart: unless-stopped
thanos-compactor:
image: thanosio/thanos:latest
container_name: thanos-compactor
command:
- compact
- --data-dir=/tmp/thanos/compact
- --objstore.config-file=/bucket-config.yml
- --http-address=0.0.0.0:10902
- --wait
volumes:
- ./bucket-config.yml:/bucket-config.yml
- thanos-compactor-data:/tmp/thanos/compact
networks:
- thanos-net
depends_on:
- minio
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
container_name: thanos-node-exporter
ports:
- "9100:9100"
networks:
- thanos-net
restart: unless-stopped
volumes:
prometheus-data:
thanos-store-data:
thanos-compactor-data:
EOF
# Deploy the complete Thanos stack
docker compose up -d
# Check if all services are running
docker compose ps
Amazing! Your complete Thanos stack is now running! ๐
๐ง Step 5: Configure Thanos Ruler (Optional)
For advanced alerting and recording rules, letโs add Thanos Ruler:
# Create alerting rules
mkdir -p ~/thanos-setup/rules
cat > rules/example.rules.yml << 'EOF'
groups:
- name: example
rules:
- alert: HighErrorRate
expr: rate(prometheus_http_requests_total{code="500"}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors per second"
- record: instance:cpu_usage:rate5m
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
EOF
# Add Thanos Ruler to docker-compose
cat >> docker-compose.yml << 'EOF'
thanos-ruler:
image: thanosio/thanos:latest
container_name: thanos-ruler
ports:
- "10904:10902"
command:
- rule
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --rule-file=/rules/*.rules.yml
- --data-dir=/tmp/thanos/ruler
- --eval-interval=15s
- --objstore.config-file=/bucket-config.yml
- --query=thanos-query:9090
volumes:
- ./rules:/rules
- ./bucket-config.yml:/bucket-config.yml
- thanos-ruler-data:/tmp/thanos/ruler
networks:
- thanos-net
depends_on:
- thanos-query
- minio
restart: unless-stopped
volumes:
thanos-ruler-data:
EOF
# Restart to include the ruler
docker compose down && docker compose up -d
Great! Now you have alerting and recording rules with Thanos Ruler! ๐
โ Step 6: Verify Thanos Installation
Letโs make sure everything is working perfectly:
# Check all containers are running
docker compose ps
# Check Prometheus is accessible
curl -s http://localhost:9090/api/v1/query?query=up | jq
# Check Thanos Query is working
curl -s http://localhost:9091/api/v1/query?query=up | jq
# Check Thanos Sidecar metrics
curl -s http://localhost:10902/metrics | head -n 10
# Verify MinIO has data
mc ls local/thanos-bucket
# Check object storage integration
curl -s http://localhost:10903/api/v1/status/config | jq
You should see all services running and data being stored in MinIO! ๐ฏ
Open your browser to access:
- Prometheus:
http://your-server-ip:9090
- Thanos Query:
http://your-server-ip:9091
- MinIO Console:
http://your-server-ip:9001
๐ฎ Quick Examples
Letโs try some practical examples to see Thanos in action! ๐
Example 1: Query Historical Data
# Generate some sample metrics by hitting Prometheus
for i in {1..100}; do
curl -s http://localhost:9090/api/v1/query?query=up > /dev/null
sleep 5
done
# Wait for data to be uploaded to object storage (5-10 minutes)
sleep 600
# Query recent data through Thanos Query
curl -s "http://localhost:9091/api/v1/query?query=prometheus_http_requests_total" | jq '.data.result[0].value'
# Query historical data with time range
curl -s "http://localhost:9091/api/v1/query_range?query=up&start=$(date -d '1 hour ago' +%s)&end=$(date +%s)&step=300" | jq
See how Thanos seamlessly provides both recent and historical data! โฐ
Example 2: Multi-Cluster Global View
# Simulate second Prometheus cluster
cat > docker-compose-cluster2.yml << 'EOF'
version: '3.8'
networks:
thanos-net:
external: true
services:
prometheus-cluster2:
image: prom/prometheus:latest
container_name: prometheus-cluster2
ports:
- "9092:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=2h'
- '--storage.tsdb.min-block-duration=2h'
- '--storage.tsdb.max-block-duration=2h'
volumes:
- ./prometheus-config-cluster2:/etc/prometheus
- prometheus-cluster2-data:/prometheus
networks:
- thanos-net
restart: unless-stopped
thanos-sidecar-cluster2:
image: thanosio/thanos:latest
container_name: thanos-sidecar-cluster2
ports:
- "10905:10902"
command:
- sidecar
- --tsdb.path=/prometheus
- --prometheus.url=http://prometheus-cluster2:9090
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config-file=/bucket-config.yml
volumes:
- prometheus-cluster2-data:/prometheus
- ./bucket-config.yml:/bucket-config.yml
networks:
- thanos-net
depends_on:
- prometheus-cluster2
restart: unless-stopped
volumes:
prometheus-cluster2-data:
EOF
# Create config for second cluster
mkdir -p prometheus-config-cluster2
cat > prometheus-config-cluster2/prometheus.yml << 'EOF'
global:
external_labels:
cluster: 'almalinux-cluster-2'
region: 'us-west-1'
replica: 'prometheus-1'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
EOF
# Start second cluster
docker compose -f docker-compose-cluster2.yml up -d
# Add second cluster to Thanos Query
docker compose exec thanos-query \
/bin/sh -c "kill -HUP 1" # This would reload if configured with --store flags
Now you can query across both clusters from a single Thanos Query interface! ๐
Example 3: Grafana Dashboard Integration
# Add Grafana to the stack
cat >> docker-compose.yml << 'EOF'
grafana:
image: grafana/grafana:latest
container_name: thanos-grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=thanosadmin
volumes:
- grafana-data:/var/lib/grafana
networks:
- thanos-net
restart: unless-stopped
volumes:
grafana-data:
EOF
# Restart stack with Grafana
docker compose down && docker compose up -d
# Wait for Grafana to start
sleep 30
# Configure Grafana datasource via API
curl -X POST \
http://admin:thanosadmin@localhost:3000/api/datasources \
-H 'Content-Type: application/json' \
-d '{
"name": "Thanos",
"type": "prometheus",
"url": "http://thanos-query:9090",
"access": "proxy",
"isDefault": true
}'
Access Grafana at http://localhost:3000
(admin/thanosadmin) and create dashboards with your Thanos data! ๐
๐จ Fix Common Problems
Here are solutions to the most common Thanos issues you might encounter:
Problem 1: Object Storage Connection Failed ๐ชฃ
Symptoms: Thanos components canโt connect to MinIO
Solutions:
# Check MinIO is accessible
curl -v http://localhost:9000/minio/health/live
# Verify bucket exists and permissions
mc ls local/thanos-bucket
mc policy get local/thanos-bucket
# Test bucket configuration
docker compose exec thanos-sidecar \
/bin/sh -c "thanos tools bucket verify --objstore.config-file=/bucket-config.yml"
# Check network connectivity between containers
docker compose exec thanos-sidecar ping minio
Problem 2: No Data in Thanos Query ๐
Symptoms: Thanos Query shows no metrics or incomplete data
Solutions:
# Check Thanos Sidecar is uploading blocks
curl http://localhost:10902/metrics | grep thanos_objstore
# Verify Prometheus has external labels configured
curl -s http://localhost:9090/api/v1/status/config | grep external_labels
# Check Thanos Store is loading blocks
docker compose logs thanos-store | grep -i "loaded"
# Force Prometheus to create blocks (restart)
docker compose restart prometheus
Problem 3: High Memory Usage ๐พ
Symptoms: Thanos components using excessive memory
Solutions:
# Limit memory in docker-compose.yml
services:
thanos-query:
mem_limit: 1g
mem_reservation: 512m
# Configure downsampling more aggressively
docker compose exec thanos-compactor \
/bin/sh -c "thanos compact --retention.resolution-raw=7d --retention.resolution-5m=30d --retention.resolution-1h=1y"
# Monitor memory usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Problem 4: Slow Query Performance ๐
Symptoms: Queries taking too long to complete
Solutions:
# Enable query pushdown for better performance
docker compose exec thanos-query \
/bin/sh -c "thanos query --query.partial-response --query.max-concurrent=20"
# Add more Store Gateway replicas
# Scale thanos-store service in docker-compose.yml
# Check compaction status
curl -s http://localhost:10902/api/v1/status/config | jq '.config.compactor'
# Optimize MinIO for better performance
mc admin config set local notify_webhook:1 queue_limit=100
๐ Simple Commands Summary
Hereโs your quick reference guide for managing Thanos:
Task | Command | Description |
---|---|---|
Start Thanos | docker compose up -d | Launch complete Thanos stack |
Stop Thanos | docker compose down | Stop all Thanos services |
View logs | docker compose logs [service] | Check specific component logs |
Check status | docker compose ps | See all running containers |
Restart service | docker compose restart [service] | Restart specific component |
Query metrics | curl http://localhost:9091/api/v1/query?query=up | Query via Thanos Query |
Check MinIO | mc ls local/thanos-bucket | List stored blocks |
Health check | curl http://localhost:10902/metrics | Check sidecar health |
Force compaction | docker compose exec thanos-compactor /bin/sh -c "thanos compact --wait" | Trigger compaction |
Update images | docker compose pull && docker compose up -d | Update to latest versions |
๐ก Tips for Success
Here are some pro tips to get the most out of Thanos! ๐
๐ฏ Plan Your Retention Strategy: Configure different retention periods for raw data (7d), 5m downsamples (30d), and 1h downsamples (1y) to optimize storage costs.
โก Use External Labels Wisely: Set meaningful external labels (cluster, region, environment) to enable proper deduplication and querying across multiple Prometheus instances.
๐ Monitor Compaction: Keep an eye on the compaction process - itโs crucial for query performance and storage efficiency. Set up alerts for compaction failures.
๐ Enable Partial Responses: Use --query.partial-response
flag on Thanos Query to get results even if some stores are down.
๐พ Optimize Object Storage: Use lifecycle policies on your S3 bucket to move older data to cheaper storage classes (IA, Glacier).
๐ Scale Store Gateways: Add multiple Store Gateway instances and distribute blocks across them for better query performance.
๐ Integrate with Service Discovery: Use Consul, Kubernetes, or file-based service discovery to automatically discover Thanos stores and sidecars.
๐ก๏ธ Secure Your Setup: Enable HTTPS, authentication, and network policies for production deployments.
๐ What You Learned
Congratulations! Youโve successfully mastered Thanos metrics setup! ๐ Hereโs everything you accomplished:
โ
Installed complete Thanos stack on AlmaLinux 9
โ
Set up MinIO object storage for long-term metrics retention
โ
Configured Prometheus with Thanos sidecar for seamless integration
โ
Deployed Thanos Query for global metrics querying
โ
Set up Thanos Store Gateway for historical data access
โ
Configured Thanos Compactor for data optimization
โ
Created Thanos Ruler for distributed alerting
โ
Integrated with Grafana for beautiful dashboards
โ
Learned troubleshooting and performance optimization
โ
Mastered production deployment best practices
๐ฏ Why This Matters
Thanos transforms your metrics infrastructure from limited to limitless! ๐ You can now:
๐ Store Years of Metrics: Keep historical data for compliance, trend analysis, and capacity planning without breaking the bank
๐ Global Observability: Query metrics from multiple data centers, regions, and clusters as if they were one system
โก Improved Performance: Automatic downsampling and compaction keep queries fast even with massive datasets
๐ก๏ธ High Availability: No more single points of failure in your monitoring infrastructure
๐ฐ Cost Optimization: Use cheap object storage instead of expensive local SSDs for long-term retention
๐ Better Analytics: Long-term trends, seasonal patterns, and year-over-year comparisons become possible
You now have enterprise-grade metrics storage that scales infinitely and provides incredible insights into your infrastructure. This makes you invaluable for SRE, DevOps, and platform engineering roles where observability is critical! โญ
Keep monitoring, keep optimizing, and remember - with Thanos, your metrics have no limits! ๐โจ