<aside> ⚠️ By following this document, you will format and destroy data on a disk, proceed with extreme care!
</aside>
This article aims to provide a fast and clean way to recreate Ceph OSD when OSD is down.
Kubernetes cluster with at least three nodes and at least one worker node in the cluster, and has Rook Ceph installed.
<aside> 💡 Replace the name with the real name in your environment for operation.
</aside>
The environmental variables in this operation are listed as follows:
Item | Description | Name in this doc |
---|---|---|
OSD ID | The id of OSD, value is a natural number. | 2 |
Block device name | The store path of OSD is an OS-assigned device name. | sdb |
Host | The host in which the OSD is stored in | test-k8s-1 |
Ceph LV name | The LV was created by rook to store data | ceph--eba2294c--dac1--4c30--98c7--d02cd5c9830e-osd--data--16c28b6c--b9da--4280--a6a1--4da469fdd3a4 |
Check Ceph status
$ kubectl ceph -s
cluster:
id: 4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
health: HEALTH_WARN
services:
mon: 3 daemons, quorum b,a,c
mgr: a(active)
osd: 9 osds: 8 up, 8 in
data:
pools: 1 pools, 128 pgs
objects: 1.49 k objects, 4.4 GiB
usage: 21 GiB used, 210 GiB / 231 GiB avail
pgs: 128 active+clean
io:
client: 9.3 KiB/s wr, 0 op/s rd, 1 op/s wr
We can find one OSD is down.
Set noout
and nobackfill
flag on the Ceph cluster to prevent rebalancing the data files and new backfill operations.
$ kubectl ceph osd set nobackfill
$ kubectl ceph osd set noout
$ kubectl ceph -s
cluster:
id: 4ecc59d7-0173-4c2d-9802-ae0c9c9aea6c
health: HEALTH_WARN
noout,nobackfill flag(s) set
.....
Find the down OSD.
$ kubectl ceph osd tree | grep down
2 hdd 0.01859 osd.2 down 1.00000 1.00000
We can see that the number of the down OSD is 2
.
Check rook OSD pod status
$ kubectl -n rook get pod | grep osd-2
rook-ceph-osd-2-8567f5668f-72b8g 1/1 Running 1 XXm
Verify which host and disk the OSD is stored on.
$ kubectl ceph osd metadata 2 | grep -e devices -e hostname
"container_hostname": "rook-ceph-osd-2-64bf499c74-kp29f",
"devices": "dm-2,sdb",
"hostname": "test-k8s-1",
The block device in which osd 2
is stored is sdb
on host test-k8s-1
.
Connect to test-k8s-1
and check the block device list.
$ lsblk | grep -A1 sdb
sdb 259:4 0 20G 0 disk
└─ceph--eba2294c--dac1--4c30--98c7--d02cd5c9830e-osd--data--16c28b6c--b9da--4280--a6a1--4da469fdd3a4 253:1 0 19G 0 lvm
The Disk is currently occupied by LV previously created by Rook.
Take note of the LV name shown in the command for later use. In this case, it’s ceph--eba2294c--dac1--4c30--98c7--d02cd5c9830e-osd--data--16c28b6c--b9da--4280--a6a1--4da469fdd3a4
.
Back to the control node, now we need to purge the down OSD.
<aside> ⚠️ Make sure you purge the target OSD id, or you may destroy a working OSD by accident.
</aside>
# Purge the osd
$ kubectl ceph osd out osd.2
marked out osd.2.
$ kubectl ceph osd down osd.2
osd.2 is already down.
$ kubectl ceph osd purge osd.2 --yes-i-really-mean-it
purged osd.2
# Remove the rook deployment to prevent it from deploying again
$ kubectl -n rook delete deployment rook-ceph-osd-2
deployment.extensions "rook-ceph-osd-2" deleted
Connect to test-k8s-1
and clean up the block device, in this case, sdb
for the rook to use it again.
<aside> ⚠️ Make sure you connect to the target host and operate with the target block device, or you may destroy a working OSD by accident.
</aside>
$ ssh test-k8s-1
# Clean the GPT data structures on the disk
$ sudo sgdisk --zap /dev/sdb
# Erase filesystem signatures (magic strings) from the disk to make the signatures invisible for libblkid.
$ sudo wipefs -a -f /dev/sdb
# Remove the device from the logical volume
# <CEPH-FOLDER> is the LV name we captured in step 6
$ sudo dmsetup remove /dev/mapper/<CEPH-FOLDER>
# Manually remove the block special file in case it is not removed by dmsetup
$ sudo rm /dev/mapper/<CEPH-FOLDER>
# Check the udev property of the target disk, and confirm it is totally wiped out
$ udevadm info --query=property /dev/sdb | grep ID_FS_TYPE
# Remove any rook outdated files on the host
$ sudo rm -rf /var/lib/rook/osd2
Back to the control node, and check the cephcluster config to ensure the device name is in the deviceFilter wildcard.
$ kubectl get cephcluster -n rook -o yaml | less
.....
- config:
osdsPerDevice: "1"
storeType: bluestore
deviceFilter: ^sd[b-f]
name: test-k8s-1
resources: {}
.....
We can see that sdb
is in the deviceFilter wildcard.
Restart the rook-ceph-operator for it to recreate a new OSD.
# Stop rook-ceph-operator
$ kubectl -n rook-system scale deploy rook-ceph-operator --replicas 0
# Start rook-ceph-operator
$ kubectl -n rook-system scale deploy rook-ceph-operator --replicas 1
Follow the pod logs of rook-ceph-operator to monitor the OSD recreation.
# Get the pod name of rook-ceph-operator
$ kubectl -n rook-system get pod | grep rook-ceph-operator
# Follow the pod log of rook-ceph-operator
$ kubectl -n rook-system logs <rook-ceph-operator-pod-name> -f
# You will see new osd deployment message on the node it is previously stored