Deploying Distributed Storage with Ceph on AlmaLinux

As data continues to grow exponentially, organizations need scalable, reliable storage solutions that can handle petabytes of information while maintaining high availability. Ceph, the leading open-source distributed storage platform, provides unified object, block, and file storage in a single system. This comprehensive guide will walk you through deploying a production-ready Ceph cluster on AlmaLinux.

Understanding Ceph Architecture

Ceph provides a unified storage platform with several key components:

RADOS: Reliable Autonomous Distributed Object Store (foundation layer)
RBD: RADOS Block Device (block storage)
RGW: RADOS Gateway (object storage, S3/Swift compatible)
CephFS: Distributed file system
Monitors (MON): Maintain cluster state
OSDs: Object Storage Daemons (store data)
Managers (MGR): Provide additional monitoring and interfaces
MDS: Metadata Servers (for CephFS)

Prerequisites and Planning

Hardware Requirements

For a production cluster, you’ll need:

Minimum 3 nodes (5+ recommended for production)
Per node specifications:
- 8+ CPU cores
- 32GB+ RAM (64GB recommended)
- 10GbE network (dedicated cluster and public networks)
- OS disk: 100GB SSD
- OSD disks: 4+ drives per node (SSD or NVMe preferred)

Network Architecture

Plan your network topology:

Public Network: Client-facing traffic (10.0.0.0/24)
Cluster Network: OSD replication traffic (10.0.1.0/24)
Management Network: Optional, for administration

Storage Planning

Calculate your storage needs:

Replication Factor: Usually 3 (3x raw capacity needed)
Erasure Coding: More efficient but complex (1.5x-2x raw capacity)
CRUSH Map: Plan failure domains (rack, row, datacenter)

Preparing AlmaLinux Nodes

Initial System Setup

Perform on all nodes:

# Set hostnames
sudo hostnamectl set-hostname ceph-node1  # Adjust for each node

# Update system
sudo dnf update -y

# Install essential packages
sudo dnf install -y epel-release
sudo dnf install -y vim wget curl net-tools python3 lvm2

# Configure hosts file
cat << EOF | sudo tee -a /etc/hosts
10.0.0.11 ceph-node1
10.0.0.12 ceph-node2
10.0.0.13 ceph-node3
10.0.0.14 ceph-node4
10.0.0.15 ceph-node5
EOF

# Disable SELinux (or configure for Ceph)
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/selinux/config

# Configure firewall
sudo firewall-cmd --permanent --add-service=ceph-mon
sudo firewall-cmd --permanent --add-service=ceph
sudo firewall-cmd --permanent --add-port=6789/tcp  # Monitor
sudo firewall-cmd --permanent --add-port=6800-7300/tcp  # OSDs
sudo firewall-cmd --permanent --add-port=8080/tcp  # Dashboard
sudo firewall-cmd --permanent --add-port=8443/tcp  # Dashboard SSL
sudo firewall-cmd --permanent --add-port=3000/tcp  # Grafana
sudo firewall-cmd --permanent --add-port=9093/tcp  # Alertmanager
sudo firewall-cmd --permanent --add-port=9095/tcp  # Prometheus
sudo firewall-cmd --reload

Network Configuration

Configure network interfaces:

# Public network interface
sudo nmcli connection modify ens160 ipv4.addresses 10.0.0.11/24
sudo nmcli connection modify ens160 ipv4.gateway 10.0.0.1
sudo nmcli connection modify ens160 ipv4.dns 8.8.8.8
sudo nmcli connection modify ens160 ipv4.method manual
sudo nmcli connection up ens160

# Cluster network interface (dedicated for Ceph traffic)
sudo nmcli connection add type ethernet con-name cluster ifname ens192
sudo nmcli connection modify cluster ipv4.addresses 10.0.1.11/24
sudo nmcli connection modify cluster ipv4.method manual
sudo nmcli connection up cluster

# Optimize network settings
cat << EOF | sudo tee /etc/sysctl.d/ceph-network.conf
# Increase Linux autotuning TCP buffer limits
net.core.rmem_default = 134217728
net.core.rmem_max = 134217728
net.core.wmem_default = 134217728
net.core.wmem_max = 134217728
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq
EOF

sudo sysctl -p /etc/sysctl.d/ceph-network.conf

Storage Preparation

Prepare OSD disks:

# List available disks
lsblk

# Clean disks (WARNING: This will destroy all data!)
# Repeat for each OSD disk
sudo sgdisk --zap-all /dev/sdb
sudo sgdisk --zap-all /dev/sdc
sudo sgdisk --zap-all /dev/sdd
sudo sgdisk --zap-all /dev/sde

# Remove any existing LVM metadata
sudo pvremove /dev/sdb /dev/sdc /dev/sdd /dev/sde -ff -y
sudo vgremove -ff ceph-*

# Clean partition table
sudo dd if=/dev/zero of=/dev/sdb bs=1M count=100
sudo dd if=/dev/zero of=/dev/sdc bs=1M count=100
sudo dd if=/dev/zero of=/dev/sdd bs=1M count=100
sudo dd if=/dev/zero of=/dev/sde bs=1M count=100

Installing Ceph

Installing Cephadm

On the first node (admin node):

# Install cephadm
sudo dnf install -y cephadm

# Install ceph-common
sudo dnf install -y ceph-common

# Verify installation
cephadm version

Bootstrapping the Cluster

# Generate SSH key (if not exists)
ssh-keygen -t rsa -b 4096 -N '' -f ~/.ssh/id_rsa

# Copy SSH key to all nodes
for i in {1..5}; do
    ssh-copy-id root@ceph-node$i
done

# Bootstrap Ceph cluster
sudo cephadm bootstrap \
    --mon-ip 10.0.0.11 \
    --cluster-network 10.0.1.0/24 \
    --initial-dashboard-user admin \
    --initial-dashboard-password 'SecurePass123!' \
    --dashboard-password-noupdate

# Save the credentials shown in output!

Adding Nodes to Cluster

# Get the cluster public key
sudo ceph cephadm get-pub-key > ~/ceph.pub

# Copy admin keyring
sudo cp /etc/ceph/ceph.client.admin.keyring ~/

# Add other nodes
for i in {2..5}; do
    ssh-copy-id -f -i ~/ceph.pub root@ceph-node$i
    sudo ceph orch host add ceph-node$i 10.0.0.1$i
done

# Verify hosts
sudo ceph orch host ls

# Label hosts by role
sudo ceph orch host label add ceph-node1 _admin
sudo ceph orch host label add ceph-node2 mon
sudo ceph orch host label add ceph-node3 mon

Configuring Storage

Adding OSDs

# List available devices
sudo ceph orch device ls

# Add all available devices as OSDs
sudo ceph orch apply osd --all-available-devices

# Or add specific devices
sudo ceph orch daemon add osd ceph-node1:/dev/sdb
sudo ceph orch daemon add osd ceph-node1:/dev/sdc
sudo ceph orch daemon add osd ceph-node1:/dev/sdd
sudo ceph orch daemon add osd ceph-node1:/dev/sde

# Monitor OSD creation
sudo ceph -s
watch sudo ceph osd tree

Creating Storage Pools

# Create replicated pool for RBD
sudo ceph osd pool create rbd_pool 128 128 replicated
sudo ceph osd pool application enable rbd_pool rbd

# Create pool for CephFS metadata
sudo ceph osd pool create cephfs_metadata 32 32 replicated
sudo ceph osd pool create cephfs_data 128 128 replicated

# Create erasure coded pool for object storage
sudo ceph osd erasure-code-profile set ec-profile k=4 m=2
sudo ceph osd pool create ec_pool 128 128 erasure ec-profile
sudo ceph osd pool application enable ec_pool rgw

# Set pool quotas
sudo ceph osd pool set-quota rbd_pool max_bytes 10737418240000  # 10TB
sudo ceph osd pool set-quota rbd_pool max_objects 1000000

CRUSH Map Configuration

# View current CRUSH map
sudo ceph osd crush tree

# Create custom CRUSH rules
sudo ceph osd crush rule create-replicated rack_rule default rack
sudo ceph osd pool set rbd_pool crush_rule rack_rule

# Add rack-level failure domains
sudo ceph osd crush add-bucket rack1 rack
sudo ceph osd crush add-bucket rack2 rack
sudo ceph osd crush move rack1 root=default
sudo ceph osd crush move rack2 root=default

# Move hosts to racks
sudo ceph osd crush move ceph-node1 rack=rack1
sudo ceph osd crush move ceph-node2 rack=rack1
sudo ceph osd crush move ceph-node3 rack=rack2
sudo ceph osd crush move ceph-node4 rack=rack2
sudo ceph osd crush move ceph-node5 rack=rack2

Configuring Ceph Services

Setting Up RBD (Block Storage)

# Initialize RBD pool
sudo rbd pool init rbd_pool

# Create a block device
sudo rbd create --size 10G rbd_pool/test-image

# List images
sudo rbd ls rbd_pool

# Map RBD device (on client)
sudo modprobe rbd
sudo rbd map rbd_pool/test-image

# Create filesystem
sudo mkfs.ext4 /dev/rbd0
sudo mkdir /mnt/ceph-block
sudo mount /dev/rbd0 /mnt/ceph-block

# Enable RBD mirroring (for DR)
sudo rbd mirror pool enable rbd_pool image

Setting Up CephFS (File System)

# Create CephFS
sudo ceph fs new cephfs cephfs_metadata cephfs_data

# Deploy MDS daemons
sudo ceph orch apply mds cephfs --placement="3 ceph-node1 ceph-node2 ceph-node3"

# Verify CephFS status
sudo ceph fs status
sudo ceph mds stat

# Create CephFS user
sudo ceph auth get-or-create client.cephfs mon 'allow r' mds 'allow r, allow rw path=/' osd 'allow rw pool=cephfs_data' -o /etc/ceph/ceph.client.cephfs.keyring

# Mount CephFS (on client)
sudo mkdir /mnt/cephfs
sudo mount -t ceph ceph-node1:6789:/ /mnt/cephfs -o name=cephfs,secretfile=/etc/ceph/cephfs.secret

# Or use kernel mount
sudo mount -t ceph ceph-node1,ceph-node2,ceph-node3:/ /mnt/cephfs -o name=cephfs,secret=AQBSdFhm8jCwBhAAFL0Y0II6vbfeEqEEuITEew==

Setting Up RGW (Object Storage)

# Deploy RGW
sudo ceph orch apply rgw myrgw --placement="2 ceph-node4 ceph-node5" --port=8080

# Create RGW user
sudo radosgw-admin user create --uid=testuser --display-name="Test User" [email protected]

# Create S3 credentials
sudo radosgw-admin key create --uid=testuser --key-type=s3 --access-key=ACCESS_KEY --secret-key=SECRET_KEY

# Test S3 access
cat << EOF > ~/.s3cfg
[default]
access_key = ACCESS_KEY
secret_key = SECRET_KEY
host_base = ceph-node4:8080
host_bucket = %(bucket)s.ceph-node4:8080
use_https = False
EOF

s3cmd mb s3://test-bucket
s3cmd put /etc/hosts s3://test-bucket/

Monitoring and Management

Ceph Dashboard Configuration

# Enable dashboard module
sudo ceph mgr module enable dashboard

# Configure SSL
sudo ceph dashboard create-self-signed-cert

# Set dashboard URL
sudo ceph config set mgr mgr/dashboard/server_addr 0.0.0.0
sudo ceph config set mgr mgr/dashboard/server_port 8443
sudo ceph config set mgr mgr/dashboard/ssl true

# Create dashboard user
sudo ceph dashboard ac-user-create admin -i /tmp/dashboard-password administrator

# Enable additional features
sudo ceph dashboard feature enable rbd
sudo ceph dashboard feature enable cephfs
sudo ceph dashboard feature enable rgw

# Access dashboard at https://ceph-node1:8443

Prometheus and Grafana Integration

# Enable Prometheus module
sudo ceph mgr module enable prometheus

# Deploy monitoring stack
sudo ceph orch apply prometheus --placement="1 ceph-node1"
sudo ceph orch apply grafana --placement="1 ceph-node1"
sudo ceph orch apply alertmanager --placement="1 ceph-node1"

# Configure Grafana dashboards
sudo ceph dashboard grafana-api-url https://ceph-node1:3000
sudo ceph dashboard grafana-api-username admin
sudo ceph dashboard grafana-api-password admin

# Import Ceph dashboards
curl -X POST http://admin:admin@ceph-node1:3000/api/dashboards/import \
    -H "Content-Type: application/json" \
    -d @ceph-dashboards.json

Setting Up Alerts

# /etc/ceph/ceph-alerts.yml
groups:
  - name: ceph_alerts
    rules:
      - alert: CephHealthError
        expr: ceph_health_status == 2
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Ceph cluster health is in ERROR state"
          description: "Ceph cluster has been in ERROR state for more than 5 minutes"
      
      - alert: CephOSDDown
        expr: ceph_osd_up == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Ceph OSD {{ $labels.osd }} is down"
          description: "OSD {{ $labels.osd }} has been down for more than 5 minutes"
      
      - alert: CephPoolFull
        expr: ceph_pool_bytes_used / ceph_pool_max_avail > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Ceph pool {{ $labels.pool }} is nearly full"
          description: "Pool {{ $labels.pool }} is {{ $value | humanizePercentage }} full"

Performance Tuning

OSD Optimization

# Set OSD memory target
sudo ceph config set osd osd_memory_target 4294967296  # 4GB

# Configure bluestore cache
sudo ceph config set osd bluestore_cache_size_hdd 1073741824  # 1GB
sudo ceph config set osd bluestore_cache_size_ssd 3221225472  # 3GB

# Set recovery and backfill settings
sudo ceph config set osd osd_recovery_max_active 3
sudo ceph config set osd osd_recovery_max_single_start 1
sudo ceph config set osd osd_recovery_sleep 0.1

# Configure scrubbing
sudo ceph config set osd osd_scrub_begin_hour 23
sudo ceph config set osd osd_scrub_end_hour 6
sudo ceph config set osd osd_scrub_load_threshold 0.5

Network Optimization

# Enable jumbo frames
sudo ip link set dev ens192 mtu 9000

# Make persistent
sudo nmcli connection modify cluster 802-3-ethernet.mtu 9000

# Configure messenger v2 protocol
sudo ceph config set global ms_type2_mode true
sudo ceph config set global ms_bind_msgr2 true

# Set network priorities
sudo ceph config set osd osd_client_message_priority 63
sudo ceph config set osd osd_recovery_op_priority 3

Client-Side Optimization

# RBD cache settings
sudo ceph config set client rbd_cache true
sudo ceph config set client rbd_cache_size 33554432  # 32MB
sudo ceph config set client rbd_cache_max_dirty 25165824  # 24MB

# CephFS cache settings
cat << EOF >> /etc/ceph/ceph.conf
[client]
client_cache_size = 16384
client_cache_readahead_max_periods = 4
client_oc_size = 209715200
EOF

Maintenance Operations

Cluster Health Monitoring

# Check cluster health
sudo ceph -s
sudo ceph health detail

# Monitor real-time activity
sudo ceph -w

# Check OSD statistics
sudo ceph osd df tree
sudo ceph osd pool stats

# Monitor I/O
sudo ceph osd pool stats rbd_pool
sudo rados -p rbd_pool bench 60 write --no-cleanup

Backup and Recovery

# Backup Ceph configuration
sudo tar -czf ceph-config-backup.tar.gz /etc/ceph/

# Export CRUSH map
sudo ceph osd getcrushmap -o crushmap.bin
sudo crushtool -d crushmap.bin -o crushmap.txt

# Backup monitor data
sudo systemctl stop ceph-mon@$(hostname)
sudo tar -czf mon-backup.tar.gz /var/lib/ceph/mon/
sudo systemctl start ceph-mon@$(hostname)

# Create pool snapshots
sudo rbd snap create rbd_pool/test-image@backup-$(date +%Y%m%d)
sudo rbd snap ls rbd_pool/test-image

Scaling Operations

# Add new OSD
sudo ceph orch device ls --refresh
sudo ceph orch daemon add osd ceph-node6:/dev/sdb

# Remove OSD safely
sudo ceph osd out osd.10
sudo ceph osd safe-to-destroy osd.10
sudo ceph osd destroy osd.10 --yes-i-really-mean-it

# Add new monitor
sudo ceph orch daemon add mon ceph-node6

# Rebalance data
sudo ceph osd reweight osd.5 0.9
sudo ceph osd pool set rbd_pool pg_num 256

Disaster Recovery

Setting Up RBD Mirroring

# On primary site
sudo rbd mirror pool enable rbd_pool pool
sudo rbd mirror pool peer add rbd_pool client.remote@remote_cluster

# On secondary site
sudo rbd mirror pool enable rbd_pool pool
sudo rbd mirror pool peer add rbd_pool client.primary@primary_cluster

# Create mirroring daemon
sudo ceph orch apply rbd-mirror --placement="1 ceph-node1"

# Enable mirroring for specific image
sudo rbd mirror image enable rbd_pool/test-image
sudo rbd mirror image status rbd_pool/test-image

Multi-Site RGW Configuration

# Create realm
sudo radosgw-admin realm create --rgw-realm=multisite --default

# Create master zone group
sudo radosgw-admin zonegroup create --rgw-zonegroup=us --endpoints=http://ceph-node4:8080 --rgw-realm=multisite --master --default

# Create master zone
sudo radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east --master --default --endpoints=http://ceph-node4:8080

# Create system user
sudo radosgw-admin user create --uid="sync-user" --display-name="Synchronization User" --system

# On secondary site
sudo radosgw-admin realm pull --url=http://ceph-node4:8080 --access-key=ACCESS_KEY --secret-key=SECRET_KEY
sudo radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-west --endpoints=http://secondary:8080 --access-key=ACCESS_KEY --secret-key=SECRET_KEY

Security Best Practices

Encryption Configuration

# Enable encryption at rest
sudo ceph config set osd osd_dmcrypt_type luks

# Configure OSD encryption
sudo ceph-volume lvm create --dmcrypt --data /dev/sdb

# Enable in-flight encryption
sudo ceph config set global ms_encrypt_secure_mode true
sudo ceph config set global ms_encrypt_mode true

Access Control

# Create restricted user
sudo ceph auth get-or-create client.restricted mon 'allow r' osd 'allow rw pool=restricted_pool'

# Set pool permissions
sudo ceph osd pool application enable restricted_pool rbd
sudo ceph osd pool set restricted_pool allow_ec_overwrites true

# Configure RGW bucket policies
cat << EOF > bucket-policy.json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["arn:aws:iam:::user/testuser"]},
    "Action": ["s3:GetObject"],
    "Resource": ["arn:aws:s3:::test-bucket/*"]
  }]
}
EOF

Troubleshooting

Common Issues and Solutions

# OSD won't start
sudo journalctl -u ceph-osd@0 -f
sudo ceph-volume lvm list

# Monitor issues
sudo ceph mon stat
sudo ceph daemon mon.$(hostname) mon_status

# Slow operations
sudo ceph daemon osd.0 dump_ops_in_flight
sudo ceph osd pool stats | grep -i slow

# PG issues
sudo ceph pg dump_stuck
sudo ceph pg repair 1.2f

# Network issues
sudo ceph osd ping-test
sudo iperf3 -s  # On one node
sudo iperf3 -c ceph-node2  # On another node

Recovery Procedures

# Recover from OSD failure
sudo ceph osd in osd.5
sudo systemctl restart ceph-osd@5

# Fix inconsistent PGs
sudo ceph pg repair 2.1a
sudo ceph osd scrub osd.10

# Recover from mon failure
sudo ceph-mon -i $(hostname) --extract-monmap /tmp/monmap
sudo ceph-mon -i $(hostname) --inject-monmap /tmp/monmap

Best Practices

Capacity Planning
- Keep cluster utilization below 70%
- Plan for failure domains
- Monitor growth trends
Performance
- Use SSDs for journals/metadata
- Separate cluster and public networks
- Enable jumbo frames
Reliability
- Regular health checks
- Automated monitoring
- Tested backup procedures
Security
- Enable encryption
- Implement proper access controls
- Regular security audits

Conclusion

You’ve successfully deployed a production-ready Ceph distributed storage cluster on AlmaLinux. This scalable storage solution provides unified object, block, and file storage with high availability and fault tolerance. Remember to continuously monitor cluster health, perform regular maintenance, and plan for capacity growth to ensure your Ceph cluster remains performant and reliable.

Ceph’s flexibility and scalability make it an ideal choice for modern storage needs, from small deployments to exascale installations. Continue exploring advanced features like erasure coding, cache tiering, and multi-site replication to fully leverage Ceph’s capabilities.