This document guide you to recreate a new Ceph monitor on a Kubernetes cluster if the Ceph monitor is out of quorum.
Check ceph cluster status, and one Ceph monitor mon.a
is out of quorum.
$ kubectl ceph -s
cluster:
id: 4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
health: HEALTH_WARN
services:
mon: 3 daemons, quorum b,c, out of quorum: a
mgr: a(active)
osd: 9 osds: 9 up, 9 in
data:
pools: 1 pools, 128 pgs
objects: 1.49 k objects, 4.4 GiB
usage: 21 GiB used, 210 GiB / 231 GiB avail
pgs: 128 active+clean
Check the Ceph health detail.
$ kubectl ceph health detail
...
MON_DOWN 1/3 mons down, quorum b,c
mon.a (rank 0) addr 10.43.116.150:6789/0 is down (out of quorum)
Check the Ceph monitoring pod status, and all the Ceph monitor pods are running. (Not necessary for all the pods to be running for the next step, just a record)
$ kubectl -n rook get pod | grep mon
rook-ceph-mon-a-7fdc8559b6-vfqgx 1/1 Running 1 11d
rook-ceph-mon-b-8597ccdd76-q9qsr 1/1 Running 1 11d
rook-ceph-mon-c-7c7c7fbdff-gbxwx 1/1 Running 1 11d
Check Kubernetes cluster status, and make sure there is no node down.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
test-k8s-1 Ready controlplane,etcd,worker 11d v1.21.9
test-k8s-2 Ready controlplane,etcd,worker 11d v1.21.9
test-k8s-3 Ready worker 11d v1.21.9
Scale rook-ceph-mon deployment rook-ceph-mon-a
to 0 to prevent further deployment of the pod.
$ kubectl -n rook get deploy | grep mon
rook-ceph-mon-a 1/1 1 1 11d
rook-ceph-mon-b 1/1 1 1 11d
rook-ceph-mon-c 1/1 1 1 11d
$ kubectl -n rook scale deploy rook-ceph-mon-a --replicas 0
Edit rook configmap rook-ceph-mon-endpoints
to remove inactive Ceph monitor.
$ kubectl -n rook edit cm rook-ceph-mon-endpoints
# Before
apiVersion: v1
data:
csi-cluster-config-json: '[{"clusterID":"rook","monitors":["10.43.1.72:6789","10.43.206.183:6789","10.43.116.150:6789"]}]'
data: b=10.43.1.72:6789,c=10.43.206.183:6789,a=10.43.116.150:6789
mapping: '{"node":{"a":{"Name":"test-k8s-2","Hostname":"test-k8s-2","Address":"192.168.1.12"},"b":{"Name":"test-k8s-3","Hostname":"test-k8s-3","Address":"192.168.1.13"},"c":{"Name":"test-k8s-1","Hostname":"test-k8s-1","Address":"192.168.1.11"}}}'
maxMonId: "2"
# After, remove mon.a
apiVersion: v1
data:
csi-cluster-config-json: '[{"clusterID":"rook","monitors":["10.43.1.72:6789","10.43.206.183:6789"]}]'
data: b=10.43.1.72:6789,c=10.43.206.183:6789
mapping: '{"node":{"b":{"Name":"test-k8s-3","Hostname":"test-k8s-3","Address":"192.168.1.13"},"c":{"Name":"test-k8s-1","Hostname":"test-k8s-1","Address":"192.168.1.11"}}}'
maxMonId: "2"
Remove mon.a
from the Ceph monitor.
$ kubectl ceph mon remove a
removing mon.a at 10.43.116.150:6789/0, there will be 2 monitors
Check the Ceph cluster status again to see if mon.a
is no longer in the Ceph monitor quorum.
$ kubectl ceph -s
cluster:
id: 4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
health: HEALTH_OK
services:
mon: 2 daemons, quorum b,c
mgr: a(active)
osd: 9 osds: 9 up, 9 in
data:
pools: 1 pools, 128 pgs
objects: 1.49 k objects, 4.4 GiB
usage: 21 GiB used, 210 GiB / 231 GiB avail
pgs: 128 active+clean
io:
client: 7.7 KiB/s wr, 0 op/s rd, 0 op/s wr
Restart rook-ceph-operator to make Ceph recreate the new Ceph monitor automatically.
# Stop rook-ceph-operator
$ kubectl -n rook-system scale deploy rook-ceph-operator --replicas 0
# Restart rook-ceph-operator
$ kubectl -n rook-system scale deploy rook-ceph-operator --replicas 1
Check rook deployment to see if the new Ceph monitor is created. We can see mon.d
is created on node test-k8s-1
.
$ kubectl -n rook get pod -o wide | grep mon
rook-ceph-mon-c-7fdc8559b6-vfqgx 1/1 Running 1 11d 192.168.1.12 test-k8s-2 <none> <none>
rook-ceph-mon-b-8597ccdd76-q9qsr 1/1 Running 1 11d 192.168.1.13 test-k8s-3 <none> <none>
rook-ceph-mon-d-9fdc2df9b6-b1d45 1/1 Running 1 60s 192.168.1.11 test-k8s-1 <none> <none>
Delete unused Ceph monitor deployment mon.a
.
$ kubectl -n rook delete deploy rook-ceph-mon-a
deployment.extensions "rook-ceph-mon-a" deleted
$ kubectl -n rook get deploy | grep mon
rook-ceph-mon-b 1/1 1 1 11d
rook-ceph-mon-c 1/1 1 1 11d
rook-ceph-mon-d 1/1 1 1 2m
Connect to the node on which the Ceph monitor is created, and local files will be created under /var/lib/rook/<mon-id>/data/
. The result will look like this:
# Connect to node which ceph monitor is created on
$ ll /var/lib/rook/mon-d/data/
total 20
drwxr-xr-x 3 167 167 4096 Sep 1 20:24 ./
drwxr-xr-x 3 root root 4096 Sep 1 20:21 ../
-rw------- 1 167 167 77 Sep 1 20:24 keyring
-rw-r--r-- 1 167 167 8 Sep 1 20:24 kv_backend
drwxr-xr-x 2 167 167 4096 Sep 13 16:31 store.db/
Check Ceph cluster status, and unset any flag set by rook during the operation.
$ kubectl ceph -s
cluster:
id: 4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
health: HEALTH_OK
noscrub,nodeep-scrub flag(s) set
services:
mon: 3 daemons, quorum b,d,c
mgr: a(active)
osd: 9 osds: 9 up, 9 in
data:
pools: 1 pools, 128 pgs
objects: 1.49 k objects, 4.4 GiB
usage: 21 GiB used, 210 GiB / 231 GiB avail
pgs: 128 active+clean
io:
client: 9.3 KiB/s wr, 0 op/s rd, 0 op/s wr
$ kubectl ceph osd unset noscrub
noscrub is unset
$ kubectl ceph osd unset nodeep-scrub
nodeep-scrub is unset
The whole procedure is completed.