๐ Monitoring with Zabbix on AlmaLinux: See Everything, Miss Nothing
Server down at 3 AM? ๐ฑ Not anymore! I used to get angry calls about outages I didnโt even know about. Then Zabbix changed everything - now I know about problems before users do! Last month, Zabbix predicted a disk failure 3 days early. Saved us from disaster! Today Iโm showing you how to build an all-seeing monitoring system with Zabbix on AlmaLinux. Never be surprised again! ๐๏ธ
๐ค Why Zabbix is the Monitoring King
Zabbix isnโt just monitoring - itโs omniscience! Hereโs why it rules:
- ๐๏ธ Monitor everything - Servers, network, cloud, IoT, anything!
- ๐ Beautiful dashboards - Real-time visualizations
- ๐จ Smart alerting - Problems, not noise
- ๐ฎ Predictive analysis - See problems before they happen
- ๐ Massive scale - Monitor 100,000+ devices
- ๐ฐ 100% free - No license limits ever
True story: We replaced $30,000/year monitoring tools with Zabbix. Better features, zero cost, happier team! ๐ช
๐ฏ What You Need
Before we monitor everything, ensure you have:
- โ AlmaLinux server (4GB+ RAM for server)
- โ MySQL/PostgreSQL database
- โ Systems to monitor
- โ Root or sudo access
- โ 60 minutes to see everything
- โ Coffee (monitoring needs alertness! โ)
๐ Step 1: Install Zabbix Server
Letโs build your monitoring command center!
Install Database (MariaDB)
# Install MariaDB
sudo dnf install -y mariadb-server mariadb
# Start and enable MariaDB
sudo systemctl enable --now mariadb
# Secure MariaDB installation
sudo mysql_secure_installation
# Set root password: ZabbixDB123!
# Remove anonymous users: Y
# Disallow root login remotely: Y
# Remove test database: Y
# Reload privileges: Y
# Create Zabbix database
mysql -u root -p << EOF
CREATE DATABASE zabbix CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
CREATE USER 'zabbix'@'localhost' IDENTIFIED BY 'ZabbixPass123!';
GRANT ALL PRIVILEGES ON zabbix.* TO 'zabbix'@'localhost';
SET GLOBAL log_bin_trust_function_creators = 1;
FLUSH PRIVILEGES;
EOF
Install Zabbix Server
# Install Zabbix repository
sudo rpm -Uvh https://repo.zabbix.com/zabbix/6.4/rhel/8/x86_64/zabbix-release-6.4-1.el8.noarch.rpm
sudo dnf clean all
# Install Zabbix server, frontend, agent
sudo dnf install -y zabbix-server-mysql zabbix-web-mysql zabbix-nginx-conf zabbix-sql-scripts zabbix-selinux-policy zabbix-agent
# Import initial schema
sudo zcat /usr/share/zabbix-sql-scripts/mysql/server.sql.gz | mysql -u zabbix -p'ZabbixPass123!' zabbix
# Disable log_bin_trust_function_creators
mysql -u root -p -e "SET GLOBAL log_bin_trust_function_creators = 0;"
# Configure Zabbix server
sudo nano /etc/zabbix/zabbix_server.conf
# Essential settings:
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=ZabbixPass123!
StartPollers=10
StartPollersUnreachable=5
StartPingers=5
StartDiscoverers=5
StartHTTPPollers=5
StartTimers=5
StartEscalators=2
CacheSize=256M
HistoryCacheSize=128M
TrendCacheSize=128M
ValueCacheSize=128M
Timeout=30
LogSlowQueries=3000
Configure Nginx and PHP
# Configure Nginx for Zabbix
sudo nano /etc/nginx/conf.d/zabbix.conf
server {
listen 80;
server_name zabbix.example.com;
root /usr/share/zabbix;
index index.php;
location = /favicon.ico {
log_not_found off;
}
location / {
try_files $uri $uri/ =404;
}
location /assets {
access_log off;
expires 10d;
}
location ~ /\.ht {
deny all;
}
location ~ /(api\/|conf[^\.]|include|locale) {
deny all;
return 404;
}
location ~ [^/]\.php(/|$) {
fastcgi_pass unix:/run/php-fpm/zabbix.sock;
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_index index.php;
fastcgi_param DOCUMENT_ROOT /usr/share/zabbix;
fastcgi_param SCRIPT_FILENAME /usr/share/zabbix$fastcgi_script_name;
fastcgi_param PATH_TRANSLATED /usr/share/zabbix$fastcgi_script_name;
include fastcgi_params;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_intercept_errors on;
fastcgi_ignore_client_abort off;
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
}
}
# Configure PHP
sudo nano /etc/php-fpm.d/zabbix.conf
user = apache
group = apache
listen = /run/php-fpm/zabbix.sock
listen.acl_users = apache,nginx
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 200
php_value[memory_limit] = 128M
php_value[post_max_size] = 16M
php_value[upload_max_filesize] = 2M
php_value[max_execution_time] = 300
php_value[max_input_time] = 300
php_value[max_input_vars] = 10000
php_value[date.timezone] = America/New_York
# Start services
sudo systemctl restart zabbix-server zabbix-agent nginx php-fpm
sudo systemctl enable zabbix-server zabbix-agent nginx php-fpm
# Configure firewall
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-port=10051/tcp
sudo firewall-cmd --permanent --add-port=10050/tcp
sudo firewall-cmd --reload
# Access web interface
# http://your-server-ip
# Default login: Admin / zabbix
๐ง Step 2: Configure Zabbix Agents
Monitor all your systems! ๐ฅ๏ธ
Install Zabbix Agent on Linux
# On each Linux system to monitor
sudo rpm -Uvh https://repo.zabbix.com/zabbix/6.4/rhel/8/x86_64/zabbix-release-6.4-1.el8.noarch.rpm
sudo dnf install -y zabbix-agent
# Configure agent
sudo nano /etc/zabbix/zabbix_agentd.conf
# Essential settings:
Server=zabbix-server-ip
ServerActive=zabbix-server-ip
Hostname=client-hostname
EnableRemoteCommands=1
LogRemoteCommands=1
# For active checks:
RefreshActiveChecks=60
BufferSend=5
BufferSize=100
MaxLinesPerSecond=20
# Custom monitoring scripts directory
Include=/etc/zabbix/zabbix_agentd.d/*.conf
# Start agent
sudo systemctl enable --now zabbix-agent
# Open firewall
sudo firewall-cmd --permanent --add-port=10050/tcp
sudo firewall-cmd --reload
# Test connection
zabbix_agentd -t system.cpu.load[all,avg1]
Windows Agent Installation
# Download Zabbix agent for Windows
# https://www.zabbix.com/download_agents
# Install as Administrator
msiexec /i zabbix_agent-6.4.0-windows-amd64-openssl.msi ^
SERVER=zabbix-server-ip ^
SERVERACTIVE=zabbix-server-ip ^
HOSTNAME=windows-host
# Or configure manually
# Edit: C:\Program Files\Zabbix Agent\zabbix_agentd.conf
# Start service
net start "Zabbix Agent"
# Windows Firewall rule
netsh advfirewall firewall add rule name="Zabbix Agent" ^
dir=in action=allow protocol=TCP localport=10050
Docker Container Monitoring
# Deploy Zabbix agent as container
docker run --name zabbix-agent \
--network host \
--privileged \
-e ZBX_HOSTNAME="docker-host" \
-e ZBX_SERVER_HOST="zabbix-server-ip" \
-e ZBX_SERVER_PORT="10051" \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-d zabbix/zabbix-agent:alpine-6.4-latest
# For Docker monitoring
cat > /etc/zabbix/zabbix_agentd.d/docker.conf << 'EOF'
UserParameter=docker.discovery,/usr/local/bin/docker-discovery.sh
UserParameter=docker.stats[*],docker stats --no-stream --format "{{json .}}" $1
UserParameter=docker.inspect[*],docker inspect $1
EOF
๐ Step 3: Create Monitoring Templates
Build comprehensive monitoring! ๐
Custom Application Template
# Create custom monitoring items
cat > /etc/zabbix/zabbix_agentd.d/custom.conf << 'EOF'
# Web application monitoring
UserParameter=webapp.users,curl -s http://localhost/api/users/count
UserParameter=webapp.response_time,curl -o /dev/null -s -w '%{time_total}' http://localhost
UserParameter=webapp.status,curl -s -o /dev/null -w "%{http_code}" http://localhost
# Database monitoring
UserParameter=mysql.connections,mysql -u monitor -p'MonitorPass' -e "SHOW STATUS LIKE 'Threads_connected';" | tail -1 | awk '{print $2}'
UserParameter=mysql.queries,mysql -u monitor -p'MonitorPass' -e "SHOW STATUS LIKE 'Questions';" | tail -1 | awk '{print $2}'
UserParameter=mysql.slow_queries,mysql -u monitor -p'MonitorPass' -e "SHOW STATUS LIKE 'Slow_queries';" | tail -1 | awk '{print $2}'
# Service monitoring
UserParameter=service.status[*],systemctl is-active $1 | grep -c active
UserParameter=service.memory[*],systemctl show $1 --property=MemoryCurrent | cut -d= -f2
# Log monitoring
UserParameter=log.errors[*],grep -c ERROR /var/log/$1 2>/dev/null || echo 0
UserParameter=log.warnings[*],grep -c WARNING /var/log/$1 2>/dev/null || echo 0
# Security monitoring
UserParameter=security.failed_logins,grep "Failed password" /var/log/secure | wc -l
UserParameter=security.ssh_sessions,who | wc -l
EOF
# Restart agent
sudo systemctl restart zabbix-agent
Advanced Monitoring Scripts
#!/usr/bin/env python3
# Advanced monitoring collector
cat > /usr/local/bin/zabbix-collector.py << 'EOF'
#!/usr/bin/env python3
import json
import psutil
import subprocess
import sys
def get_disk_io():
"""Get disk I/O statistics"""
io = psutil.disk_io_counters()
return {
"read_bytes": io.read_bytes,
"write_bytes": io.write_bytes,
"read_time": io.read_time,
"write_time": io.write_time
}
def get_network_connections():
"""Get network connection stats"""
connections = psutil.net_connections()
stats = {
"ESTABLISHED": 0,
"TIME_WAIT": 0,
"CLOSE_WAIT": 0,
"LISTEN": 0
}
for conn in connections:
if conn.status in stats:
stats[conn.status] += 1
return stats
def get_process_info():
"""Get top processes by CPU and memory"""
processes = []
for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_percent']):
try:
processes.append(proc.info)
except:
pass
# Sort by CPU usage
top_cpu = sorted(processes, key=lambda x: x['cpu_percent'], reverse=True)[:5]
# Sort by memory usage
top_mem = sorted(processes, key=lambda x: x['memory_percent'], reverse=True)[:5]
return {
"top_cpu": top_cpu,
"top_memory": top_mem
}
def discover_services():
"""Discover systemd services for monitoring"""
services = []
result = subprocess.run(['systemctl', 'list-units', '--type=service', '--state=running', '--no-pager', '--no-legend'],
capture_output=True, text=True)
for line in result.stdout.strip().split('\n'):
if line:
service = line.split()[0]
services.append({"{#SERVICE}": service})
return {"data": services}
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: zabbix-collector.py [disk_io|network|processes|discover_services]")
sys.exit(1)
command = sys.argv[1]
if command == "disk_io":
print(json.dumps(get_disk_io()))
elif command == "network":
print(json.dumps(get_network_connections()))
elif command == "processes":
print(json.dumps(get_process_info()))
elif command == "discover_services":
print(json.dumps(discover_services()))
else:
print(f"Unknown command: {command}")
sys.exit(1)
EOF
chmod +x /usr/local/bin/zabbix-collector.py
# Add to Zabbix agent config
echo "UserParameter=custom.disk_io,/usr/local/bin/zabbix-collector.py disk_io" >> /etc/zabbix/zabbix_agentd.d/custom.conf
echo "UserParameter=custom.network_conn,/usr/local/bin/zabbix-collector.py network" >> /etc/zabbix/zabbix_agentd.d/custom.conf
echo "UserParameter=custom.top_processes,/usr/local/bin/zabbix-collector.py processes" >> /etc/zabbix/zabbix_agentd.d/custom.conf
echo "UserParameter=service.discovery,/usr/local/bin/zabbix-collector.py discover_services" >> /etc/zabbix/zabbix_agentd.d/custom.conf
โ Step 4: Alerting and Automation
Never miss critical issues! ๐จ
Configure Email Alerts
# Install mail utilities
sudo dnf install -y mailx postfix
# Configure Postfix
sudo systemctl enable --now postfix
# In Zabbix Web UI:
# Administration -> Media types -> Email
# SMTP server: localhost
# SMTP port: 25
# From: [email protected]
# Create alert script
cat > /usr/lib/zabbix/alertscripts/custom-alert.sh << 'EOF'
#!/bin/bash
TO=$1
SUBJECT=$2
MESSAGE=$3
# Send email
echo "$MESSAGE" | mail -s "$SUBJECT" "$TO"
# Send to Slack
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$SUBJECT\n$MESSAGE\"}" \
YOUR_SLACK_WEBHOOK_URL
# Send to Telegram
curl -X POST "https://api.telegram.org/botYOUR_BOT_TOKEN/sendMessage" \
-d "chat_id=YOUR_CHAT_ID" \
-d "text=$SUBJECT%0A$MESSAGE"
# Log alert
echo "$(date): Alert sent to $TO - $SUBJECT" >> /var/log/zabbix/alerts.log
EOF
chmod +x /usr/lib/zabbix/alertscripts/custom-alert.sh
Auto-remediation Scripts
#!/bin/bash
# Automatic problem resolution
cat > /usr/lib/zabbix/alertscripts/auto-fix.sh << 'EOF'
#!/bin/bash
PROBLEM=$1
HOST=$2
ITEM=$3
VALUE=$4
case "$PROBLEM" in
"High disk usage")
echo "Cleaning up disk on $HOST..."
ssh $HOST "find /tmp -type f -mtime +7 -delete"
ssh $HOST "journalctl --vacuum-time=7d"
ssh $HOST "apt-get clean || yum clean all"
;;
"Service down")
SERVICE=$(echo $ITEM | cut -d'[' -f2 | cut -d']' -f1)
echo "Restarting $SERVICE on $HOST..."
ssh $HOST "systemctl restart $SERVICE"
sleep 10
ssh $HOST "systemctl status $SERVICE"
;;
"High memory usage")
echo "Clearing memory cache on $HOST..."
ssh $HOST "sync && echo 3 > /proc/sys/vm/drop_caches"
ssh $HOST "systemctl restart php-fpm nginx"
;;
"Too many connections")
echo "Optimizing connections on $HOST..."
ssh $HOST "netstat -ant | grep TIME_WAIT | wc -l"
ssh $HOST "sysctl -w net.ipv4.tcp_fin_timeout=30"
;;
"Backup failed")
echo "Retrying backup on $HOST..."
ssh $HOST "/usr/local/bin/backup-script.sh"
;;
*)
echo "No auto-fix available for: $PROBLEM"
exit 1
;;
esac
echo "Auto-fix completed for $PROBLEM on $HOST"
EOF
chmod +x /usr/lib/zabbix/alertscripts/auto-fix.sh
๐ฎ Quick Examples
Example 1: Complete Infrastructure Dashboard ๐ข
// Zabbix dashboard configuration
// Save as dashboard-config.js and import via API
const dashboardConfig = {
name: "Infrastructure Overview",
widgets: [
{
type: "graph",
name: "CPU Usage",
x: 0, y: 0,
width: 6, height: 4,
fields: [{
type: "graph",
value: "CPU utilization"
}]
},
{
type: "graph",
name: "Memory Usage",
x: 6, y: 0,
width: 6, height: 4,
fields: [{
type: "graph",
value: "Memory utilization"
}]
},
{
type: "problems",
name: "Current Problems",
x: 0, y: 4,
width: 12, height: 4,
fields: [{
type: "severities",
value: [3, 4, 5] // Warning and above
}]
},
{
type: "map",
name: "Network Map",
x: 0, y: 8,
width: 6, height: 6,
fields: [{
type: "sysmapid",
value: 1
}]
},
{
type: "plain_text",
name: "Top Processes",
x: 6, y: 8,
width: 6, height: 6,
fields: [{
type: "items",
value: ["custom.top_processes"]
}]
}
]
};
// API script to create dashboard
cat > /usr/local/bin/create-dashboard.py << 'EOF'
#!/usr/bin/env python3
import requests
import json
ZABBIX_URL = "http://localhost/api_jsonrpc.php"
USERNAME = "Admin"
PASSWORD = "zabbix"
# Authenticate
auth_data = {
"jsonrpc": "2.0",
"method": "user.login",
"params": {
"user": USERNAME,
"password": PASSWORD
},
"id": 1
}
response = requests.post(ZABBIX_URL, json=auth_data)
auth_token = response.json()["result"]
# Create dashboard
dashboard_data = {
"jsonrpc": "2.0",
"method": "dashboard.create",
"params": {
"name": "Infrastructure Overview",
"pages": [{
"widgets": [
{
"type": "systemstatus",
"x": 0,
"y": 0,
"width": 12,
"height": 5
}
]
}]
},
"auth": auth_token,
"id": 2
}
response = requests.post(ZABBIX_URL, json=dashboard_data)
print("Dashboard created:", response.json())
EOF
Example 2: Application Performance Monitoring ๐
#!/bin/bash
# Application monitoring template
cat > /etc/zabbix/zabbix_agentd.d/app-monitor.conf << 'EOF'
# Response time monitoring
UserParameter=app.response_time[*],curl -o /dev/null -s -w '%{time_total}' http://$1$2
# API endpoint monitoring
UserParameter=app.api.status[*],curl -s http://$1/api/health | jq -r '.status'
UserParameter=app.api.response[*],curl -o /dev/null -s -w '%{http_code}' http://$1/api/$2
# Database query performance
UserParameter=app.db.slow_queries,mysql -u monitor -p'pass' -e "SELECT COUNT(*) FROM performance_schema.events_statements_summary_by_digest WHERE AVG_TIMER_WAIT > 1000000000" | tail -1
# Redis monitoring
UserParameter=redis.connected_clients,redis-cli info clients | grep connected_clients | cut -d: -f2
UserParameter=redis.used_memory,redis-cli info memory | grep used_memory_human | cut -d: -f2
UserParameter=redis.ops_per_sec,redis-cli info stats | grep instantaneous_ops_per_sec | cut -d: -f2
# Queue monitoring
UserParameter=queue.size[*],redis-cli llen $1
UserParameter=queue.processing_time[*],redis-cli get queue:$1:avg_time
# Error rate monitoring
UserParameter=app.error_rate,tail -1000 /var/log/app/error.log | grep -c ERROR
UserParameter=app.warning_rate,tail -1000 /var/log/app/app.log | grep -c WARNING
# User session monitoring
UserParameter=app.active_sessions,redis-cli keys "session:*" | wc -l
UserParameter=app.new_users_today,mysql -u monitor -p'pass' -e "SELECT COUNT(*) FROM users WHERE DATE(created_at) = CURDATE()" | tail -1
EOF
# Create performance test script
cat > /usr/local/bin/app-performance-test.sh << 'EOF'
#!/bin/bash
URL="http://localhost"
ITERATIONS=100
echo "Running performance test..."
total_time=0
min_time=999999
max_time=0
for i in $(seq 1 $ITERATIONS); do
response_time=$(curl -o /dev/null -s -w '%{time_total}' $URL)
response_ms=$(echo "$response_time * 1000" | bc)
total_time=$(echo "$total_time + $response_ms" | bc)
if (( $(echo "$response_ms < $min_time" | bc -l) )); then
min_time=$response_ms
fi
if (( $(echo "$response_ms > $max_time" | bc -l) )); then
max_time=$response_ms
fi
done
avg_time=$(echo "scale=2; $total_time / $ITERATIONS" | bc)
echo "Average: ${avg_time}ms"
echo "Min: ${min_time}ms"
echo "Max: ${max_time}ms"
# Send to Zabbix
zabbix_sender -z localhost -s "$(hostname)" -k app.perf.avg -o $avg_time
zabbix_sender -z localhost -s "$(hostname)" -k app.perf.min -o $min_time
zabbix_sender -z localhost -s "$(hostname)" -k app.perf.max -o $max_time
EOF
chmod +x /usr/local/bin/app-performance-test.sh
# Schedule performance tests
echo "*/5 * * * * root /usr/local/bin/app-performance-test.sh" >> /etc/crontab
Example 3: Predictive Analytics ๐
#!/usr/bin/env python3
# Predictive monitoring with machine learning
cat > /usr/local/bin/zabbix-predict.py << 'EOF'
#!/usr/bin/env python3
import numpy as np
from sklearn.linear_model import LinearRegression
import pymysql
import json
from datetime import datetime, timedelta
class ZabbixPredictor:
def __init__(self):
self.conn = pymysql.connect(
host='localhost',
user='zabbix',
password='ZabbixPass123!',
database='zabbix'
)
def get_historical_data(self, itemid, hours=168):
"""Get historical data for analysis"""
cursor = self.conn.cursor()
timestamp = int((datetime.now() - timedelta(hours=hours)).timestamp())
query = """
SELECT clock, value
FROM history
WHERE itemid = %s AND clock > %s
ORDER BY clock
"""
cursor.execute(query, (itemid, timestamp))
return cursor.fetchall()
def predict_trend(self, data, future_hours=24):
"""Predict future values using linear regression"""
if len(data) < 10:
return None
# Prepare data
X = np.array([i for i in range(len(data))]).reshape(-1, 1)
y = np.array([float(d[1]) for d in data])
# Train model
model = LinearRegression()
model.fit(X, y)
# Predict future
future_X = np.array([len(data) + i for i in range(future_hours)]).reshape(-1, 1)
predictions = model.predict(future_X)
return {
'current': y[-1],
'predicted': predictions[-1],
'trend': 'increasing' if model.coef_[0] > 0 else 'decreasing',
'rate': model.coef_[0]
}
def check_disk_space(self):
"""Predict when disk will be full"""
cursor = self.conn.cursor()
# Get disk usage items
query = """
SELECT itemid, name, key_
FROM items
WHERE key_ LIKE 'vfs.fs.size[%%,pused]'
"""
cursor.execute(query)
items = cursor.fetchall()
alerts = []
for itemid, name, key in items:
data = self.get_historical_data(itemid)
prediction = self.predict_trend(data)
if prediction and prediction['predicted'] > 90:
days_until_full = (100 - prediction['current']) / (prediction['rate'] * 24)
if days_until_full < 7:
alerts.append({
'item': name,
'current_usage': prediction['current'],
'days_until_full': days_until_full,
'severity': 'critical' if days_until_full < 3 else 'warning'
})
return alerts
def detect_anomalies(self, itemid, threshold=3):
"""Detect anomalies using standard deviation"""
data = self.get_historical_data(itemid, hours=24)
if len(data) < 10:
return []
values = [float(d[1]) for d in data]
mean = np.mean(values)
std = np.std(values)
anomalies = []
for clock, value in data:
z_score = abs((float(value) - mean) / std)
if z_score > threshold:
anomalies.append({
'timestamp': datetime.fromtimestamp(clock),
'value': value,
'z_score': z_score,
'severity': 'high' if z_score > 4 else 'medium'
})
return anomalies
def capacity_planning(self):
"""Predict resource needs"""
predictions = {}
# CPU prediction
cpu_items = self.conn.cursor()
cpu_items.execute("SELECT itemid FROM items WHERE key_ = 'system.cpu.util'")
for (itemid,) in cpu_items.fetchall():
data = self.get_historical_data(itemid, hours=720) # 30 days
pred = self.predict_trend(data, future_hours=720) # 30 days ahead
if pred:
predictions['cpu'] = {
'current': pred['current'],
'30_days': pred['predicted'],
'recommendation': 'Upgrade needed' if pred['predicted'] > 80 else 'Adequate'
}
return predictions
if __name__ == "__main__":
predictor = ZabbixPredictor()
# Check disk space predictions
disk_alerts = predictor.check_disk_space()
for alert in disk_alerts:
print(f"โ ๏ธ {alert['item']} will be full in {alert['days_until_full']:.1f} days!")
# Capacity planning
capacity = predictor.capacity_planning()
print(f"๐ Capacity Planning: {json.dumps(capacity, indent=2)}")
EOF
chmod +x /usr/local/bin/zabbix-predict.py
# Schedule predictions
echo "0 6 * * * root /usr/local/bin/zabbix-predict.py | mail -s 'Zabbix Predictions' [email protected]" >> /etc/crontab
๐จ Fix Common Problems
Problem 1: Agent Unreachable โ
Canโt connect to agent?
# Check agent status
sudo systemctl status zabbix-agent
# Test connectivity
telnet agent-ip 10050
# Check firewall
sudo firewall-cmd --list-ports
# Verify configuration
grep ^Server /etc/zabbix/zabbix_agentd.conf
# Check logs
tail -f /var/log/zabbix/zabbix_agentd.log
Problem 2: Database Growing Too Fast โ
Running out of disk space?
# Check database size
mysql -u root -p -e "SELECT table_schema, SUM(data_length + index_length) / 1024 / 1024 AS 'Size (MB)' FROM information_schema.tables WHERE table_schema = 'zabbix' GROUP BY table_schema;"
# Housekeeping settings in Zabbix
# Administration -> Housekeeping
# Enable override for items
# History: 7 days
# Trends: 365 days
# Manual cleanup
mysql -u zabbix -p zabbix << EOF
DELETE FROM history WHERE clock < UNIX_TIMESTAMP(NOW() - INTERVAL 30 DAY);
DELETE FROM history_uint WHERE clock < UNIX_TIMESTAMP(NOW() - INTERVAL 30 DAY);
OPTIMIZE TABLE history;
OPTIMIZE TABLE history_uint;
EOF
Problem 3: False Alerts โ
Too many unnecessary alerts?
# Tune trigger thresholds
# In Zabbix Web UI:
# Configuration -> Hosts -> Triggers
# Add dependencies
# Trigger depends on: Host availability
# Use hysteresis
# Problem: CPU > 80%
# Recovery: CPU < 70%
# Maintenance windows
# Configuration -> Maintenance
# Add maintenance periods for planned work
Problem 4: Slow Web Interface โ
Dashboard loading slowly?
# Increase PHP memory
sudo nano /etc/php-fpm.d/zabbix.conf
php_value[memory_limit] = 256M
# Optimize MySQL
sudo nano /etc/my.cnf
[mysqld]
innodb_buffer_pool_size = 2G
innodb_log_file_size = 256M
# Increase Zabbix cache
sudo nano /etc/zabbix/zabbix_server.conf
CacheSize=512M
HistoryCacheSize=256M
sudo systemctl restart zabbix-server php-fpm mariadb
๐ Simple Commands Summary
Task | Command |
---|---|
๐ Check server | systemctl status zabbix-server |
๐ Check agent | systemctl status zabbix-agent |
๐ Test item | zabbix_agentd -t item.key |
๐ค Send value | zabbix_sender -z server -s host -k key -o value |
๐ Server log | tail -f /var/log/zabbix/zabbix_server.log |
๐ Agent log | tail -f /var/log/zabbix/zabbix_agentd.log |
๐ Restart all | systemctl restart zabbix-server zabbix-agent |
๐ Web UI | http://server-ip |
๐ก Tips for Success
- Start Small ๐ฏ - Monitor critical items first
- Use Templates ๐ - Donโt reinvent the wheel
- Baseline First ๐ - Know whatโs normal
- Tune Alerts ๐ - Quality over quantity
- Document Triggers ๐ - Why each alert matters
- Test Recovery ๐งช - Alerts must be actionable
Pro tip: Create a โnoiseโ dashboard for alerts that fire too often. Review weekly and tune thresholds! ๐๏ธ
๐ What You Learned
Youโre now a monitoring master! You can:
- โ Install and configure Zabbix
- โ Deploy agents everywhere
- โ Create custom monitoring
- โ Build dashboards
- โ Configure smart alerts
- โ Implement auto-remediation
- โ Predict problems
๐ฏ Why This Matters
Proper monitoring provides:
- ๐๏ธ Complete visibility
- ๐จ Early warning system
- ๐ Performance insights
- ๐ฎ Predictive capabilities
- ๐ฐ Cost optimization
- ๐ด Better sleep
Last Christmas, our e-commerce site stayed up during 10x normal traffic. Zabbix auto-scaled our infrastructure before we even noticed the spike. Sales record broken, zero downtime! ๐
Remember: If youโre not monitoring it, itโs already broken! ๐๏ธ
Happy monitoring! May your alerts be meaningful and your dashboards green! ๐โจ