Goal

This document guide you to recreate a new Ceph monitor on a Kubernetes cluster if the Ceph monitor is out of quorum.

Step-by-step Method

  1. Check ceph cluster status, and one Ceph monitor mon.a is out of quorum.

    $ kubectl ceph -s
      cluster:
        id:     4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
        health: HEALTH_WARN
     
      services:
        mon: 3 daemons, quorum b,c, out of quorum: a
        mgr: a(active)
        osd: 9 osds: 9 up, 9 in
     
      data:
        pools:   1 pools, 128 pgs
        objects: 1.49 k objects, 4.4 GiB
        usage:   21 GiB used, 210 GiB / 231 GiB avail
        pgs:     128 active+clean
    
  2. Check the Ceph health detail.

    $ kubectl ceph health detail
    ...
    MON_DOWN 1/3 mons down, quorum b,c
        mon.a (rank 0) addr 10.43.116.150:6789/0 is down (out of quorum)
    
  3. Check the Ceph monitoring pod status, and all the Ceph monitor pods are running. (Not necessary for all the pods to be running for the next step, just a record)

    $ kubectl -n rook get pod | grep mon
    rook-ceph-mon-a-7fdc8559b6-vfqgx         1/1     Running     1          11d
    rook-ceph-mon-b-8597ccdd76-q9qsr         1/1     Running     1          11d
    rook-ceph-mon-c-7c7c7fbdff-gbxwx         1/1     Running     1          11d
    
  4. Check Kubernetes cluster status, and make sure there is no node down.

    $ kubectl get nodes
    NAME         STATUS   ROLES                      AGE   VERSION
    test-k8s-1   Ready    controlplane,etcd,worker   11d   v1.21.9
    test-k8s-2   Ready    controlplane,etcd,worker   11d   v1.21.9
    test-k8s-3   Ready    worker                     11d   v1.21.9
    
  5. Scale rook-ceph-mon deployment rook-ceph-mon-a to 0 to prevent further deployment of the pod.

    $ kubectl -n rook get deploy | grep mon
    rook-ceph-mon-a   1/1     1            1           11d
    rook-ceph-mon-b   1/1     1            1           11d
    rook-ceph-mon-c   1/1     1            1           11d
    
    $ kubectl -n rook scale deploy rook-ceph-mon-a --replicas 0
    
  6. Edit rook configmap rook-ceph-mon-endpoints to remove inactive Ceph monitor.

    $ kubectl -n rook edit cm rook-ceph-mon-endpoints
    
    # Before
    apiVersion: v1
    data:
      csi-cluster-config-json: '[{"clusterID":"rook","monitors":["10.43.1.72:6789","10.43.206.183:6789","10.43.116.150:6789"]}]'
      data: b=10.43.1.72:6789,c=10.43.206.183:6789,a=10.43.116.150:6789
      mapping: '{"node":{"a":{"Name":"test-k8s-2","Hostname":"test-k8s-2","Address":"192.168.1.12"},"b":{"Name":"test-k8s-3","Hostname":"test-k8s-3","Address":"192.168.1.13"},"c":{"Name":"test-k8s-1","Hostname":"test-k8s-1","Address":"192.168.1.11"}}}'
      maxMonId: "2"
    
    # After, remove mon.a
    apiVersion: v1
    data:
      csi-cluster-config-json: '[{"clusterID":"rook","monitors":["10.43.1.72:6789","10.43.206.183:6789"]}]'
      data: b=10.43.1.72:6789,c=10.43.206.183:6789
      mapping: '{"node":{"b":{"Name":"test-k8s-3","Hostname":"test-k8s-3","Address":"192.168.1.13"},"c":{"Name":"test-k8s-1","Hostname":"test-k8s-1","Address":"192.168.1.11"}}}'
      maxMonId: "2"
    
  7. Remove mon.a from the Ceph monitor.

    $ kubectl ceph mon remove a
    removing mon.a at 10.43.116.150:6789/0, there will be 2 monitors
    
  8. Check the Ceph cluster status again to see if mon.a is no longer in the Ceph monitor quorum.

    $ kubectl ceph -s
      cluster:
        id:     4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
        health: HEALTH_OK
     
      services:
        mon: 2 daemons, quorum b,c
        mgr: a(active)
        osd: 9 osds: 9 up, 9 in
     
      data:
        pools:   1 pools, 128 pgs
        objects: 1.49 k objects, 4.4 GiB
        usage:   21 GiB used, 210 GiB / 231 GiB avail
        pgs:     128 active+clean
     
      io:
        client:   7.7 KiB/s wr, 0 op/s rd, 0 op/s wr
    
  9. Restart rook-ceph-operator to make Ceph recreate the new Ceph monitor automatically.

    # Stop rook-ceph-operator
    $ kubectl -n rook-system scale deploy rook-ceph-operator --replicas 0
    
    # Restart rook-ceph-operator
    $ kubectl -n rook-system scale deploy rook-ceph-operator --replicas 1
    
  10. Check rook deployment to see if the new Ceph monitor is created. We can see mon.d is created on node test-k8s-1.

    $ kubectl -n rook get pod -o wide | grep mon
    rook-ceph-mon-c-7fdc8559b6-vfqgx         1/1     Running     1          11d   192.168.1.12    test-k8s-2   <none>           <none>
    rook-ceph-mon-b-8597ccdd76-q9qsr         1/1     Running     1          11d   192.168.1.13    test-k8s-3   <none>           <none>
    rook-ceph-mon-d-9fdc2df9b6-b1d45         1/1     Running     1          60s   192.168.1.11    test-k8s-1   <none>           <none>
    
  11. Delete unused Ceph monitor deployment mon.a.

    $ kubectl -n rook delete deploy rook-ceph-mon-a
    deployment.extensions "rook-ceph-mon-a" deleted
    
    $ kubectl -n rook get deploy | grep mon
    rook-ceph-mon-b   1/1     1            1           11d
    rook-ceph-mon-c   1/1     1            1           11d
    rook-ceph-mon-d   1/1     1            1           2m
    
  12. Connect to the node on which the Ceph monitor is created, and local files will be created under /var/lib/rook/<mon-id>/data/. The result will look like this:

    # Connect to node which ceph monitor is created on
    $ ll /var/lib/rook/mon-d/data/
    total 20
    drwxr-xr-x 3  167  167 4096 Sep  1 20:24 ./
    drwxr-xr-x 3 root root 4096 Sep  1 20:21 ../
    -rw------- 1  167  167   77 Sep  1 20:24 keyring
    -rw-r--r-- 1  167  167    8 Sep  1 20:24 kv_backend
    drwxr-xr-x 2  167  167 4096 Sep 13 16:31 store.db/
    
  13. Check Ceph cluster status, and unset any flag set by rook during the operation.

    $ kubectl ceph -s
      cluster:
        id:     4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
        health: HEALTH_OK
    						noscrub,nodeep-scrub flag(s) set
     
      services:
        mon: 3 daemons, quorum b,d,c
        mgr: a(active)
        osd: 9 osds: 9 up, 9 in
     
      data:
        pools:   1 pools, 128 pgs
        objects: 1.49 k objects, 4.4 GiB
        usage:   21 GiB used, 210 GiB / 231 GiB avail
        pgs:     128 active+clean
     
      io:
        client:   9.3 KiB/s wr, 0 op/s rd, 0 op/s wr
    
    $ kubectl ceph osd unset noscrub
    noscrub is unset
    
    $ kubectl ceph osd unset nodeep-scrub
    nodeep-scrub is unset
    

The whole procedure is completed.

Reference