๐ PostgreSQL Database Setup and Optimization on AlmaLinux: Enterprise Power
MySQL too basic for your needs? ๐ค I thought so too until our startup hit complex queries that MySQL couldnโt handle! We switched to PostgreSQL and suddenly our 30-second reports ran in 200ms! Advanced features like JSON support, full-text search, and real ACID compliance changed everything. Today Iโm showing you how to unleash PostgreSQLโs full power on AlmaLinux. Welcome to the big leagues! ๐ช
๐ค Why PostgreSQL Rules the Enterprise
PostgreSQL isnโt just another database - itโs a powerhouse! Hereโs why giants use it:
- ๐ True ACID compliance - Your data is SAFE
- ๐ Advanced data types - JSON, arrays, ranges, UUID
- ๐ Full-text search - Built-in search engine
- ๐ GIS support - Maps and location data
- ๐พ MVCC - No read locks ever!
- ๐ง Extensions - PostGIS, TimescaleDB, pg_stat_statements
True story: Instagram stores 500+ TB in PostgreSQL. If it scales for them with billions of usersโฆ ๐
๐ฏ What You Need
Before we database like pros, ensure you have:
- โ AlmaLinux server with 4GB+ RAM
- โ Root or sudo access
- โ 20GB+ free disk space
- โ Basic SQL knowledge (helpful)
- โ 45 minutes to master PostgreSQL
- โ Coffee (database tuning needs focus! โ)
๐ Step 1: Install PostgreSQL
Letโs get the elephant in the room! ๐
Install PostgreSQL 15
# Install PostgreSQL 15 repository
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable built-in PostgreSQL module
sudo dnf -qy module disable postgresql
# Install PostgreSQL 15
sudo dnf install -y postgresql15-server postgresql15 postgresql15-contrib
# Install additional tools
sudo dnf install -y postgresql15-devel postgresql15-libs pgadmin4
# Check version
/usr/pgsql-15/bin/postgres --version
# Initialize database
sudo /usr/pgsql-15/bin/postgresql-15-setup initdb
# Enable and start PostgreSQL
sudo systemctl enable --now postgresql-15
sudo systemctl status postgresql-15
# Configure firewall
sudo firewall-cmd --permanent --add-service=postgresql
sudo firewall-cmd --reload
Initial Configuration
# Switch to postgres user
sudo -i -u postgres
# Access PostgreSQL prompt
psql
-- Set postgres user password
ALTER USER postgres PASSWORD 'StrongPassword123!';
-- Create a database
CREATE DATABASE production;
-- Create a user
CREATE USER appuser WITH ENCRYPTED PASSWORD 'AppPassword123!';
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE production TO appuser;
-- Enable extensions
\c production
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE EXTENSION IF NOT EXISTS uuid-ossp;
-- Exit
\q
exit
๐ง Step 2: Configure PostgreSQL
Time to tune for performance! โก
Edit PostgreSQL Configuration
# Main configuration file
sudo nano /var/lib/pgsql/15/data/postgresql.conf
# Performance settings (adjust based on your RAM)
# For 8GB RAM server:
# Connection Settings
listen_addresses = '*' # Listen on all interfaces
max_connections = 200 # Maximum concurrent connections
superuser_reserved_connections = 3 # Reserved for superuser
# Memory Settings
shared_buffers = 2GB # 25% of RAM
effective_cache_size = 6GB # 75% of RAM
maintenance_work_mem = 512MB # For VACUUM, CREATE INDEX
work_mem = 10MB # Per sort/hash operation
wal_buffers = 16MB # Write-ahead log buffer
# Checkpoint Settings
checkpoint_completion_target = 0.9
checkpoint_timeout = 10min
max_wal_size = 4GB
min_wal_size = 1GB
# Query Planner
random_page_cost = 1.1 # For SSD (4.0 for HDD)
effective_io_concurrency = 200 # For SSD (1 for HDD)
default_statistics_target = 100
# Logging
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_line_prefix = '%m [%p] %q%u@%d '
log_statement = 'all' # Log all statements
log_duration = on # Log statement duration
log_min_duration_statement = 100 # Log slow queries (>100ms)
# Statistics
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
track_io_timing = on
track_functions = all
# Autovacuum
autovacuum = on
autovacuum_max_workers = 4
autovacuum_naptime = 30s
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.05
# Restart PostgreSQL
sudo systemctl restart postgresql-15
Configure Authentication
# Edit pg_hba.conf
sudo nano /var/lib/pgsql/15/data/pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
# Local connections
local all all peer
# IPv4 local connections
host all all 127.0.0.1/32 scram-sha-256
# IPv4 remote connections (adjust network as needed)
host all all 192.168.1.0/24 scram-sha-256
host all all 10.0.0.0/8 scram-sha-256
# IPv6 local connections
host all all ::1/128 scram-sha-256
# Replication connections
host replication replicator 192.168.1.0/24 scram-sha-256
# Restart to apply changes
sudo systemctl restart postgresql-15
๐ Step 3: Performance Optimization
Letโs make it fly! ๐
Create Optimized Tables
-- Connect to database
sudo -u postgres psql -d production
-- Create optimized table with indexes
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash TEXT NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
metadata JSONB,
is_active BOOLEAN DEFAULT true,
last_login TIMESTAMP WITH TIME ZONE
);
-- Create indexes for common queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_created_at ON users(created_at DESC);
CREATE INDEX idx_users_active ON users(is_active) WHERE is_active = true;
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
-- Create partitioned table for time-series data
CREATE TABLE events (
id BIGSERIAL,
event_time TIMESTAMP WITH TIME ZONE NOT NULL,
event_type VARCHAR(50) NOT NULL,
user_id UUID REFERENCES users(id),
event_data JSONB,
ip_address INET,
user_agent TEXT
) PARTITION BY RANGE (event_time);
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Add indexes to partitions
CREATE INDEX idx_events_2024_01_time ON events_2024_01(event_time);
CREATE INDEX idx_events_2024_01_type ON events_2024_01(event_type);
CREATE INDEX idx_events_2024_01_user ON events_2024_01(user_id);
Query Optimization Script
#!/bin/bash
# PostgreSQL optimization helper
cat > /usr/local/bin/pg-optimize.sh << 'EOF'
#!/bin/bash
DB_NAME="production"
PG_USER="postgres"
optimize_tables() {
echo "๐ง Running VACUUM ANALYZE..."
psql -U $PG_USER -d $DB_NAME -c "VACUUM ANALYZE;"
echo "โป๏ธ Reindexing database..."
psql -U $PG_USER -d $DB_NAME -c "REINDEX DATABASE $DB_NAME;"
echo "๐ Updating statistics..."
psql -U $PG_USER -d $DB_NAME -c "ANALYZE;"
}
check_bloat() {
echo "๐ Checking table bloat..."
psql -U $PG_USER -d $DB_NAME << SQL
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
ROUND(100 * pg_total_relation_size(schemaname||'.'||tablename) /
NULLIF(SUM(pg_total_relation_size(schemaname||'.'||tablename))
OVER (), 0), 2) AS percent
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
SQL
}
find_slow_queries() {
echo "๐ Finding slow queries..."
psql -U $PG_USER -d $DB_NAME << SQL
SELECT
mean_exec_time,
calls,
total_exec_time,
query
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_stat_statements%'
ORDER BY mean_exec_time DESC
LIMIT 10;
SQL
}
check_indexes() {
echo "๐ Checking unused indexes..."
psql -U $PG_USER -d $DB_NAME << SQL
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 50
ORDER BY pg_relation_size(indexrelid) DESC;
SQL
}
suggest_indexes() {
echo "๐ก Suggesting missing indexes..."
psql -U $PG_USER -d $DB_NAME << SQL
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation
FROM pg_stats
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
AND n_distinct > 100
AND correlation < 0.1
ORDER BY n_distinct DESC
LIMIT 10;
SQL
}
# Menu
case "$1" in
optimize)
optimize_tables
;;
bloat)
check_bloat
;;
slow)
find_slow_queries
;;
indexes)
check_indexes
;;
suggest)
suggest_indexes
;;
all)
optimize_tables
check_bloat
find_slow_queries
check_indexes
suggest_indexes
;;
*)
echo "Usage: $0 {optimize|bloat|slow|indexes|suggest|all}"
;;
esac
EOF
chmod +x /usr/local/bin/pg-optimize.sh
# Run optimization
pg-optimize.sh all
โ Step 4: Replication and Backup
Never lose data again! ๐พ
Set Up Streaming Replication
# On PRIMARY server
sudo -u postgres psql
-- Create replication user
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'ReplPassword123!';
-- Exit
\q
# Configure primary for replication
sudo nano /var/lib/pgsql/15/data/postgresql.conf
wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GB
hot_standby = on
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
# Create archive directory
sudo mkdir -p /archive
sudo chown postgres:postgres /archive
# Restart primary
sudo systemctl restart postgresql-15
# On REPLICA server
# Stop PostgreSQL
sudo systemctl stop postgresql-15
# Remove old data
sudo rm -rf /var/lib/pgsql/15/data/*
# Take base backup from primary
sudo -u postgres pg_basebackup \
-h primary-server-ip \
-D /var/lib/pgsql/15/data \
-U replicator \
-P -v -R -X stream -C -S replica1
# Create standby.signal
sudo -u postgres touch /var/lib/pgsql/15/data/standby.signal
# Configure replica
sudo nano /var/lib/pgsql/15/data/postgresql.conf
primary_conninfo = 'host=primary-server-ip port=5432 user=replicator password=ReplPassword123!'
primary_slot_name = 'replica1'
hot_standby = on
# Start replica
sudo systemctl start postgresql-15
# Verify replication
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"
Automated Backup Script
#!/bin/bash
# PostgreSQL backup script
cat > /usr/local/bin/pg-backup.sh << 'EOF'
#!/bin/bash
# Configuration
BACKUP_DIR="/backup/postgresql"
DB_NAME="production"
DB_USER="postgres"
S3_BUCKET="s3://my-backups/postgresql"
RETENTION_DAYS=30
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p $BACKUP_DIR
backup_database() {
echo "๐ฆ Starting backup of $DB_NAME..."
# Full backup with custom format
pg_dump -U $DB_USER -d $DB_NAME -F c -b -v \
-f "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.backup"
# SQL format backup
pg_dump -U $DB_USER -d $DB_NAME -F p --column-inserts \
-f "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.sql"
# Compress SQL backup
gzip "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.sql"
echo "โ
Backup completed!"
}
backup_globals() {
echo "๐ Backing up global objects..."
pg_dumpall -U $DB_USER --globals-only \
-f "$BACKUP_DIR/globals_${TIMESTAMP}.sql"
gzip "$BACKUP_DIR/globals_${TIMESTAMP}.sql"
}
backup_config() {
echo "โ๏ธ Backing up configuration..."
tar -czf "$BACKUP_DIR/config_${TIMESTAMP}.tar.gz" \
/var/lib/pgsql/15/data/*.conf
}
verify_backup() {
echo "โ๏ธ Verifying backup..."
pg_restore -U $DB_USER --list \
"$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.backup" > /dev/null
if [ $? -eq 0 ]; then
echo "โ
Backup verified successfully!"
else
echo "โ Backup verification failed!"
exit 1
fi
}
upload_to_s3() {
echo "โ๏ธ Uploading to S3..."
aws s3 cp "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.backup" \
"$S3_BUCKET/${DB_NAME}_${TIMESTAMP}.backup"
}
cleanup_old_backups() {
echo "๐งน Cleaning old backups..."
find $BACKUP_DIR -name "*.backup" -mtime +$RETENTION_DAYS -delete
find $BACKUP_DIR -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete
}
# Main execution
backup_database
backup_globals
backup_config
verify_backup
# upload_to_s3 # Uncomment if using S3
cleanup_old_backups
echo "๐ Backup process completed!"
echo "๐ Backup location: $BACKUP_DIR"
echo "๐ Backup size: $(du -sh $BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.backup | cut -f1)"
EOF
chmod +x /usr/local/bin/pg-backup.sh
# Add to crontab
echo "0 2 * * * /usr/local/bin/pg-backup.sh" | crontab -
๐ฎ Quick Examples
Example 1: High-Performance Application Database ๐
-- Create high-performance schema
CREATE SCHEMA app;
-- User table with optimal settings
CREATE TABLE app.users (
id BIGSERIAL PRIMARY KEY,
uuid UUID DEFAULT uuid_generate_v4() UNIQUE,
username VARCHAR(30) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash CHAR(60) NOT NULL,
profile JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
) WITH (fillfactor = 90);
-- Session table with automatic expiry
CREATE TABLE app.sessions (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id BIGINT REFERENCES app.users(id) ON DELETE CASCADE,
token TEXT UNIQUE NOT NULL,
ip_address INET,
user_agent TEXT,
expires_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create composite indexes
CREATE INDEX idx_sessions_user_expires
ON app.sessions(user_id, expires_at)
WHERE expires_at > NOW();
-- Products with full-text search
CREATE TABLE app.products (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL,
tags TEXT[],
metadata JSONB DEFAULT '{}',
search_vector tsvector,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Full-text search index
CREATE INDEX idx_products_search ON app.products USING GIN(search_vector);
-- Trigger to update search vector
CREATE OR REPLACE FUNCTION app.update_search_vector()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('english', NEW.name), 'A') ||
setweight(to_tsvector('english', COALESCE(NEW.description, '')), 'B');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_update_search
BEFORE INSERT OR UPDATE ON app.products
FOR EACH ROW
EXECUTE FUNCTION app.update_search_vector();
-- Orders with automatic archiving
CREATE TABLE app.orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT REFERENCES app.users(id),
order_data JSONB NOT NULL,
total DECIMAL(10,2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE app.orders_2024_01 PARTITION OF app.orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Function to automatically create partitions
CREATE OR REPLACE FUNCTION app.create_monthly_partition()
RETURNS void AS $$
DECLARE
start_date DATE;
end_date DATE;
partition_name TEXT;
BEGIN
start_date := DATE_TRUNC('month', CURRENT_DATE + INTERVAL '1 month');
end_date := start_date + INTERVAL '1 month';
partition_name := 'orders_' || TO_CHAR(start_date, 'YYYY_MM');
EXECUTE format('CREATE TABLE IF NOT EXISTS app.%I PARTITION OF app.orders
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date);
END;
$$ LANGUAGE plpgsql;
-- Schedule partition creation
SELECT cron.schedule('create-partitions', '0 0 25 * *',
'SELECT app.create_monthly_partition()');
Example 2: Real-time Analytics Dashboard ๐
#!/usr/bin/env python3
# pg_analytics.py - Real-time PostgreSQL analytics
import psycopg2
import psycopg2.extras
from datetime import datetime, timedelta
import json
from flask import Flask, jsonify, render_template_string
app = Flask(__name__)
# Database connection
def get_db():
return psycopg2.connect(
host="localhost",
database="production",
user="appuser",
password="AppPassword123!",
cursor_factory=psycopg2.extras.RealDictCursor
)
@app.route('/api/metrics')
def get_metrics():
conn = get_db()
cur = conn.cursor()
metrics = {}
# Database size
cur.execute("""
SELECT pg_database_size('production') as size,
pg_size_pretty(pg_database_size('production')) as pretty_size
""")
metrics['database'] = cur.fetchone()
# Active connections
cur.execute("""
SELECT count(*) as connections,
max(query_start) as latest_query
FROM pg_stat_activity
WHERE state = 'active'
""")
metrics['connections'] = cur.fetchone()
# Slow queries
cur.execute("""
SELECT query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC
LIMIT 5
""")
metrics['slow_queries'] = cur.fetchall()
# Cache hit ratio
cur.execute("""
SELECT
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) * 100 as cache_hit_ratio
FROM pg_statio_user_tables
""")
metrics['cache_hit_ratio'] = cur.fetchone()['cache_hit_ratio']
# Table statistics
cur.execute("""
SELECT
schemaname,
tablename,
n_live_tup as rows,
n_dead_tup as dead_rows,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 10
""")
metrics['tables'] = cur.fetchall()
cur.close()
conn.close()
return jsonify(metrics)
@app.route('/api/realtime')
def realtime_activity():
conn = get_db()
cur = conn.cursor()
# Current queries
cur.execute("""
SELECT
pid,
usename,
application_name,
client_addr,
backend_start,
state,
query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY backend_start DESC
""")
activity = cur.fetchall()
cur.close()
conn.close()
return jsonify(activity)
@app.route('/')
def dashboard():
html = '''
<!DOCTYPE html>
<html>
<head>
<title>PostgreSQL Dashboard</title>
<style>
body { font-family: Arial; padding: 20px; background: #f0f0f0; }
.metric {
background: white;
padding: 20px;
margin: 10px;
border-radius: 8px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}
.value { font-size: 36px; color: #2196F3; font-weight: bold; }
.label { color: #666; margin-top: 5px; }
table { width: 100%; border-collapse: collapse; }
th, td { padding: 10px; text-align: left; border-bottom: 1px solid #ddd; }
.slow { color: #ff5722; }
.healthy { color: #4caf50; }
</style>
</head>
<body>
<h1>๐ PostgreSQL Real-time Dashboard</h1>
<div id="metrics"></div>
<script>
function updateDashboard() {
fetch('/api/metrics')
.then(response => response.json())
.then(data => {
let html = '<div class="metric">';
html += '<div class="value">' + data.database.pretty_size + '</div>';
html += '<div class="label">Database Size</div>';
html += '</div>';
html += '<div class="metric">';
html += '<div class="value">' + data.connections.connections + '</div>';
html += '<div class="label">Active Connections</div>';
html += '</div>';
html += '<div class="metric">';
let ratio = parseFloat(data.cache_hit_ratio).toFixed(2);
let colorClass = ratio > 95 ? 'healthy' : 'slow';
html += '<div class="value ' + colorClass + '">' + ratio + '%</div>';
html += '<div class="label">Cache Hit Ratio</div>';
html += '</div>';
html += '<div class="metric"><h3>Top Tables</h3><table>';
html += '<tr><th>Table</th><th>Rows</th><th>Dead Rows</th></tr>';
data.tables.forEach(table => {
html += '<tr>';
html += '<td>' + table.tablename + '</td>';
html += '<td>' + table.rows + '</td>';
html += '<td>' + table.dead_rows + '</td>';
html += '</tr>';
});
html += '</table></div>';
document.getElementById('metrics').innerHTML = html;
});
}
updateDashboard();
setInterval(updateDashboard, 5000);
</script>
</body>
</html>
'''
return render_template_string(html)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5432)
Example 3: Automatic Performance Tuner ๐ง
#!/bin/bash
# PostgreSQL auto-tuner
cat > /usr/local/bin/pg-autotune.py << 'EOF'
#!/usr/bin/env python3
import psycopg2
import os
import subprocess
class PostgreSQLTuner:
def __init__(self):
self.conn = psycopg2.connect(
host="localhost",
database="postgres",
user="postgres",
password="StrongPassword123!"
)
self.cur = self.conn.cursor()
self.total_ram = self.get_system_ram()
def get_system_ram(self):
"""Get total system RAM in MB"""
with open('/proc/meminfo') as f:
for line in f:
if line.startswith('MemTotal:'):
return int(line.split()[1]) // 1024
return 4096 # Default 4GB
def calculate_settings(self):
"""Calculate optimal settings based on system resources"""
settings = {}
# Memory settings
settings['shared_buffers'] = f"{self.total_ram // 4}MB"
settings['effective_cache_size'] = f"{self.total_ram * 3 // 4}MB"
settings['maintenance_work_mem'] = f"{min(self.total_ram // 16, 2048)}MB"
settings['work_mem'] = f"{self.total_ram // 100}MB"
# Connection settings
cpu_count = os.cpu_count()
settings['max_connections'] = min(cpu_count * 25, 300)
settings['max_worker_processes'] = cpu_count
settings['max_parallel_workers'] = cpu_count
settings['max_parallel_workers_per_gather'] = max(cpu_count // 2, 2)
# WAL settings
settings['wal_buffers'] = '16MB'
settings['checkpoint_completion_target'] = '0.9'
settings['max_wal_size'] = '4GB'
settings['min_wal_size'] = '1GB'
# Storage type detection
root_device = subprocess.check_output(['df', '/']).decode().split('\n')[1].split()[0]
if 'nvme' in root_device or 'ssd' in root_device:
settings['random_page_cost'] = '1.1'
settings['effective_io_concurrency'] = '200'
else:
settings['random_page_cost'] = '4.0'
settings['effective_io_concurrency'] = '2'
return settings
def apply_settings(self, settings):
"""Apply settings to PostgreSQL"""
config_file = '/var/lib/pgsql/15/data/postgresql.auto.conf'
with open(config_file, 'w') as f:
f.write("# Auto-generated by pg-autotune\n")
for key, value in settings.items():
f.write(f"{key} = {value}\n")
print("โ
Settings written to postgresql.auto.conf")
print("๐ Restart PostgreSQL to apply: sudo systemctl restart postgresql-15")
def analyze_workload(self):
"""Analyze current workload and suggest optimizations"""
suggestions = []
# Check cache hit ratio
self.cur.execute("""
SELECT
sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100 as ratio
FROM pg_statio_user_tables
""")
cache_ratio = self.cur.fetchone()[0]
if cache_ratio and cache_ratio < 90:
suggestions.append(f"โ ๏ธ Cache hit ratio is {cache_ratio:.2f}%. Consider increasing shared_buffers.")
# Check for missing indexes
self.cur.execute("""
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
AND n_live_tup > 10000
ORDER BY seq_scan - idx_scan DESC
LIMIT 5
""")
tables_need_index = self.cur.fetchall()
if tables_need_index:
suggestions.append("๐ These tables might benefit from indexes:")
for table in tables_need_index:
suggestions.append(f" - {table[0]}.{table[1]} (seq_scans: {table[2]})")
# Check for bloat
self.cur.execute("""
SELECT
schemaname,
tablename,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / nullif(n_live_tup, 1)) as dead_percent
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 5
""")
bloated_tables = self.cur.fetchall()
if bloated_tables:
suggestions.append("๐งน These tables need VACUUM:")
for table in bloated_tables:
suggestions.append(f" - {table[0]}.{table[1]} ({table[4]}% dead tuples)")
return suggestions
def run(self):
"""Run the auto-tuner"""
print("๐ PostgreSQL Auto-Tuner")
print("=" * 50)
print(f"๐ System RAM: {self.total_ram}MB")
print(f"๐ฅ๏ธ CPU Cores: {os.cpu_count()}")
print()
print("โ๏ธ Calculating optimal settings...")
settings = self.calculate_settings()
print("\n๐ Recommended Settings:")
for key, value in settings.items():
print(f" {key} = {value}")
print("\n๐ Analyzing workload...")
suggestions = self.analyze_workload()
if suggestions:
print("\n๐ก Optimization Suggestions:")
for suggestion in suggestions:
print(suggestion)
print("\n")
response = input("Apply settings? (y/n): ")
if response.lower() == 'y':
self.apply_settings(settings)
self.cur.close()
self.conn.close()
if __name__ == '__main__':
tuner = PostgreSQLTuner()
tuner.run()
EOF
chmod +x /usr/local/bin/pg-autotune.py
python3 /usr/local/bin/pg-autotune.py
๐จ Fix Common Problems
Problem 1: Slow Queries โ
Queries taking forever?
-- Find slow queries
SELECT
query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 1000
ORDER BY mean_exec_time DESC;
-- Explain query plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table WHERE column = 'value';
-- Create missing index
CREATE INDEX CONCURRENTLY idx_table_column ON table(column);
Problem 2: Connection Refused โ
Canโt connect to PostgreSQL?
# Check if running
sudo systemctl status postgresql-15
# Check listen address
grep listen_addresses /var/lib/pgsql/15/data/postgresql.conf
# Check pg_hba.conf
sudo cat /var/lib/pgsql/15/data/pg_hba.conf
# Test connection
psql -h localhost -U postgres -d postgres
Problem 3: Out of Memory โ
PostgreSQL using too much RAM?
# Check current memory usage
sudo -u postgres psql -c "SHOW shared_buffers;"
sudo -u postgres psql -c "SHOW work_mem;"
# Reduce memory settings
sudo -u postgres psql -c "ALTER SYSTEM SET shared_buffers = '1GB';"
sudo -u postgres psql -c "ALTER SYSTEM SET work_mem = '4MB';"
# Restart
sudo systemctl restart postgresql-15
Problem 4: Disk Full โ
Running out of space?
-- Check database sizes
SELECT
datname,
pg_size_pretty(pg_database_size(datname))
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
-- Find large tables
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
-- Clean up
VACUUM FULL;
REINDEX DATABASE production;
๐ Simple Commands Summary
Task | Command |
---|---|
๐ Connect | psql -U postgres -d database |
๐ List databases | \l |
๐ List tables | \dt |
๐ง Describe table | \d tablename |
๐พ Backup | pg_dump -d database > backup.sql |
โป๏ธ Restore | psql -d database < backup.sql |
๐ Activity | SELECT * FROM pg_stat_activity; |
๐งน Vacuum | VACUUM ANALYZE; |
๐ก Tips for Success
- Monitor Everything ๐ - pg_stat_statements is gold
- Index Wisely ๐ - Too many slows writes
- Partition Large Tables ๐ - Keep them manageable
- Use Connection Pooling ๐ - PgBouncer helps
- Regular Maintenance ๐ง - VACUUM and ANALYZE
- Test Backups ๐พ - Untested backup = no backup
Pro tip: Enable log_min_duration_statement = 100
to automatically log all queries slower than 100ms. Found so many issues this way! ๐
๐ What You Learned
Youโre now a PostgreSQL master! You can:
- โ Install and configure PostgreSQL
- โ Optimize for performance
- โ Set up replication
- โ Implement backup strategies
- โ Monitor and tune queries
- โ Use advanced features
- โ Troubleshoot issues
๐ฏ Why This Matters
PostgreSQL provides:
- ๐ Enterprise performance
- ๐ Rock-solid reliability
- ๐ Advanced features
- ๐ Geospatial support
- ๐พ ACID compliance
- ๐ง Extensibility
Our startup migrated from MySQL to PostgreSQL when we needed JSON support and window functions. Query performance improved 10x, and we could finally do complex analytics in SQL instead of application code. Best decision ever! ๐
Remember: PostgreSQL isnโt just a database, itโs a data platform! Master it! ๐ช
Happy querying! May your JOINs be fast and your data be consistent! ๐โจ