10 min read

Deploy Rook Ceph Storage Cluster on Kubernetes Cluster

Deploy Rook Ceph Storage Cluster on Kubernetes Cluster
Specification : Kubernetes, ROOK, CEPH

Lab Topology

You can check installation kubernetes cluster in previous documentation, Kubernetes Cluster Multi-Master High Availability (rjhaikal.my.id)

Storages Nodes Disks

I use 3 disks extended (vdb, vdc, vdd) in each of storage-nodes, total 9 disks for rook cluster.

What is Rook / Ceph?

Rook is a free to use and powerful cloud-native open source storage orchestrator for Kubernetes. It provides support for a diverse set of storage solutions to natively integrate with cloud-native environments. More details about the storage solutions currently supported by Rook are captured in the project status section.

Ceph is a distributed storage system that provides file, block and object storage and is deployed in large scale production clusters. Rook will enable us to automate deployment, bootstrapping, configuration, scaling and upgrading Ceph Cluster within a Kubernetes environment. Ceph is widely used in an In-House Infrastructure where managed Storage solution is rarely an option.

Deploy Rook Ceph Storage on Kubernetes Cluster

These are the minimal setup requirements for the deployment of Rook and Ceph Storage on Kubernetes Cluster.

  • A Cluster with minimum of three nodes
  • Available raw disk devices (with no partitions or formatted filesystems)
  • Or Raw partitions (without formatted filesystem)
  • Or Persistent Volumes available from a storage class in block mode

If you need a simple persistent volume storage for your cluster check out my NFS article:

Kubernetes Storage - Dynamic NFS Client Provisioner
This article is more like a note to myself, but this may help if you are looking at How to setup Kubernetes Storage using Dynamic NFS Provisioning. Kubernetes NFS Subdir External Provisioner is an automatic provisioner that use your existing and already configured NFS server to support dynamic prov…

Deploy Rook Storage Orchestrator

1. Clone the rook project from Github using git command.

cd ~/
git clone --single-branch --branch release-1.11 https://github.com/rook/rook.git

cd rook/deploy/examples/

2. Deploy the Rook Operator

kubectl create -f crds.yaml
kubectl create -f common.yaml
kubectl create -f operator.yaml

kubectl get all -n rook-ceph

Create Ceph Cluster on Kubernetes

1. Set default namespace to rook-ceph

kubectl config set-context --current --namespace rook-ceph

2. Edit cluster.yaml

kubectl create -f cluster.yaml

Note: Considering that Rook Ceph clusters can discover raw partitions by itself, it is okay to use the default cluster deployment manifest file without any modifications.

3. View all resources created. It takes some minutes to deploy it, make sure all completed.

kubectl get -n rook-ceph jobs.batch
kubectl -n rook-ceph get cephcluster

Deploy Rook Ceph toolbox in Kubernetes

The Rook Ceph toolbox is a container with common tools used for rook debugging and testing.

1. Apply toolbox.yaml

cd ~
cd rook/cluster/examples/kubernetes/ceph
kubectl apply -f toolbox.yaml

2. Connect to the pod using kubectl command with exec option

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

3. Check Ceph & OSD current status

ceph -s
ceph osd status

4. Check raw storage and pools

ceph df
rados df

Working with Ceph Storage Modes

Three types of storage exposed by Rook:

  • Shared Filesystem: Create a filesystem to be shared across multiple pods (RWX)
  • Block: Create block storage to be consumed by a pod (RWO)
  • Object: Create an object store that is accessible inside or outside the Kubernetes cluster

All the necessary files for either storage mode are available in rook/cluster/examples/kubernetes/ceph/ directory.

cd ~/
cd rook/deploy/examples

Ceph Storage - Cephfs

Cephfs is used to enable shared filesystem which can be mounted with read/write permission from multiple pods.

1. Update the filesystem.yaml file by setting data pool name, replication size etc.

vim filesystem.yaml
---
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: rook-cephfs
  namespace: rook-ceph # namespace:cluster
kubectl create -f filesystem.yaml

2. Access Rook toolbox pod and check if metadata and data pools are created.

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
ceph fs ls
ceph osd lspools

3. Create storageclass for cephfs

sudo nano csi/cephfs/storageclass.yaml
---
parameters:
  # clusterID is the namespace where the rook cluster is running
  # If you change this namespace, also change the namespace below where the secret namespaces are defined
  clusterID: rook-ceph # namespace:cluster

  # CephFS filesystem name into which the volume shall be created
  fsName: rook-cephfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: rook-cephfs-replicated
kubectl create -f csi/cephfs/storageclass.yaml
kubectl get sc

4. Create Cephfs PVC

cat<<EOF > cephfs-pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc-fio
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: rook-cephfs
EOF
kubectl create -f cephfs-pvc.yaml

5. Create Fio Pod for Benchmark Mount to cephfs-pvc-fio

cat <<EOF > cephfs-fio-pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-cephfs-fio
spec:
  containers: 
    - name: pod-cephfs-fio
      image: vineethac/fio_image
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - name: cephfs-pvc-fio
          mountPath: /cephfs
  volumes:
    - name: cephfs-pvc-fio
      persistentVolumeClaim:
        claimName: cephfs-pvc-fio
        readOnly: false
EOF
kubectl create -f cephfs-fio-pod.yaml

6. Test Fio Benchmark

cd /cephfs

for i in {1..3}; do fio --randrepeat=0 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --numjobs=1 --time_based --runtime=90s --ramp_time=2s --invalidate=1 --verify=0 --verify_fatal=0 --group_reporting; done

for i in {1..3}; do fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=1G --readwrite=randread; done

for i in {1..3}; do fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=fiotest --bs=4k --iodepth=64 --size=1G --readwrite=randwrite; done

7. Benchmark Result

Ceph Storage - RBD

Block storage allows a single pod to mount storage (RWO mode).

1. Before Rook can provision storage, a StorageClass and CephBlockPool need to be created.

kubectl create -f csi/rbd/storageclass.yaml

2. Create RBD PVC

cat<<EOF > rbd-pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc-fio
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: rook-ceph-block
EOF
kubectl create -f rbd-pvc.yaml

3. Create Fio Pod for Benchmark Mount to rbd-pvc-fio

cat<<EOF > rbd-fio-pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-rbd-fio
spec:
  containers: 
    - name: pod-rbd-fio
      image: vineethac/fio_image
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - name: rbd-pvc-fio
          mountPath: /rbd
  volumes:
    - name: rbd-pvc-fio
      persistentVolumeClaim:
        claimName: rbd-pvc-fio
        readOnly: false
EOF
kubectl create -f rbd-fio-pod.yaml

4. Test Fio Benchmark

cd /rbd/

for i in {1..3}; do fio --randrepeat=0 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --numjobs=1 --time_based --runtime=90s --ramp_time=2s --invalidate=1 --verify=0 --verify_fatal=0 --group_reporting; done

for i in {1..3}; do fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=1G --readwrite=randread; done

for i in {1..3}; do fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=fiotest --bs=4k --iodepth=64 --size=1G --readwrite=randwrite; done

5. Benchmark Result

Tearing Down the Ceph Cluster (Reference only)

If you want to tear down the cluster and bring up a new one, be aware of the following resources that will need to be cleaned up:

  • rook-ceph namespace: The Rook operator and cluster created by operator.yaml and cluster.yaml (the cluster CRD)
  • /var/lib/rook: Path on each host in the cluster where configuration is cached by the ceph mons and osds

All CRDs in the cluster.

root@rj-k8s-master1:~# kubectl get crds
NAME                                                  CREATED AT
bgpconfigurations.crd.projectcalico.org               2023-06-23T09:32:45Z
bgpfilters.crd.projectcalico.org                      2023-06-23T09:32:45Z
bgppeers.crd.projectcalico.org                        2023-06-23T09:32:45Z
blockaffinities.crd.projectcalico.org                 2023-06-23T09:32:45Z
caliconodestatuses.crd.projectcalico.org              2023-06-23T09:32:45Z
cephblockpoolradosnamespaces.ceph.rook.io             2023-06-26T03:22:19Z
cephblockpools.ceph.rook.io                           2023-06-26T03:22:19Z
cephbucketnotifications.ceph.rook.io                  2023-06-26T03:22:19Z
cephbuckettopics.ceph.rook.io                         2023-06-26T03:22:19Z
cephclients.ceph.rook.io                              2023-06-26T03:22:19Z
cephclusters.ceph.rook.io                             2023-06-26T03:22:19Z
cephfilesystemmirrors.ceph.rook.io                    2023-06-26T03:22:19Z
cephfilesystems.ceph.rook.io                          2023-06-26T03:22:19Z
cephfilesystemsubvolumegroups.ceph.rook.io            2023-06-26T03:22:19Z
cephnfses.ceph.rook.io                                2023-06-26T03:22:19Z
cephobjectrealms.ceph.rook.io                         2023-06-26T03:22:19Z
cephobjectstores.ceph.rook.io                         2023-06-26T03:22:20Z
cephobjectstoreusers.ceph.rook.io                     2023-06-26T03:22:20Z
cephobjectzonegroups.ceph.rook.io                     2023-06-26T03:22:20Z
cephobjectzones.ceph.rook.io                          2023-06-26T03:22:20Z
cephrbdmirrors.ceph.rook.io                           2023-06-26T03:22:20Z
clusterinformations.crd.projectcalico.org             2023-06-23T09:32:45Z
felixconfigurations.crd.projectcalico.org             2023-06-23T09:32:45Z
globalnetworkpolicies.crd.projectcalico.org           2023-06-23T09:32:45Z
globalnetworksets.crd.projectcalico.org               2023-06-23T09:32:45Z
hostendpoints.crd.projectcalico.org                   2023-06-23T09:32:45Z
ipamblocks.crd.projectcalico.org                      2023-06-23T09:32:45Z
ipamconfigs.crd.projectcalico.org                     2023-06-23T09:32:45Z
ipamhandles.crd.projectcalico.org                     2023-06-23T09:32:45Z
ippools.crd.projectcalico.org                         2023-06-23T09:32:45Z
ipreservations.crd.projectcalico.org                  2023-06-23T09:32:46Z
kubecontrollersconfigurations.crd.projectcalico.org   2023-06-23T09:32:46Z
networkpolicies.crd.projectcalico.org                 2023-06-23T09:32:46Z
networksets.crd.projectcalico.org                     2023-06-23T09:32:46Z
objectbucketclaims.objectbucket.io                    2023-06-26T03:22:20Z
objectbuckets.objectbucket.io                         2023-06-26T03:22:20Z

Edit the CephCluster and add the cleanupPolicy

kubectl -n rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'

Delete block storage and file storage:

cd ~/
cd rook/deploy/examples
kubectl delete -n rook-ceph cephblockpool replicapool
kubectl delete -f csi/rbd/storageclass.yaml
kubectl delete -f filesystem.yaml
kubectl delete -f csi/cephfs/storageclass.yaml

Delete the CephCluster Custom Resource:

[root@k8s-bastion ~]# kubectl -n rook-ceph delete cephcluster rook-ceph
cephcluster.ceph.rook.io "rook-ceph" deleted

Verify that the cluster CR has been deleted before continuing to the next step.

kubectl -n rook-ceph get cephcluster

Delete the Operator and related Resources

kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml

Zapping Devices

# Set the raw disk / raw partition path
DISKS="/dev/vdb /dev/vdc /dev/vdd"

# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# Install: yum install gdisk -y Or apt install gdisk
for DISK in $DISKS; do
  sgdisk --zap-all $DISK
done

# Clean hdds with dd
for DISK in $DISKS; do
  dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
done

# Clean disks such as ssd with blkdiscard instead of dd
for DISK in $DISKS; do
  blkdiscard $DISK
done

# Remove mapped devices from ceph-volume setup
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %

# Remove ceph-<UUID> directories
rm -rf /dev/ceph-*
rm -rf /dev/mapper/ceph--*

# Inform the OS of partition table changes
for DISK in $DISKS; do
  partprobe $DISK
done

Removing the Cluster CRD Finalizer:

for CRD in $(kubectl get crd -n rook-ceph | awk '/ceph.rook.io/ {print $1}'); do
    kubectl get -n rook-ceph "$CRD" -o name | \
    xargs -I {} kubectl patch -n rook-ceph {} --type merge -p '{"metadata":{"finalizers": [null]}}'
done

If the namespace is still stuck in Terminating state as seen in the command below:

$ kubectl get ns rook-ceph
NAME        STATUS        AGE
rook-ceph   Terminating   5h

You can check which resources are holding up the deletion and remove the finalizers and delete those resources.

kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph

From my output the resource is configmap named rook-ceph-mon-endpoints:

NAME                                DATA   AGE
configmap/rook-ceph-mon-endpoints   4      23h

Delete the resource manually:

# kubectl delete configmap/rook-ceph-mon-endpoints -n rook-ceph
configmap "rook-ceph-mon-endpoints" deleted

Reference

Rook - Rook Ceph Documentation
Rook Ceph Documentation