When user Notebook fails to launch, from Event log of Notebook spawning page, it describes
failed: Structure needs cleaning
, which means the file system of a PVC is corrupted, it requires xfs_repair
.
To learn which /dev/rbd?
and which pvc-xxxxxxxxxx
From Event log.
To learn which node
user pod is assigned to; if the user pod has been recycled, launch it again.
kubectl get -n hub pod -o wide | grep jupyter-xxx
Find the corresponding <rook-agent-pod> according to node where user pod locates.
# Find the agent-pod on the specific node
kubectl -n rook-system get pods -o wide | grep <node_name>
ssh to the node
ssh <node>
Find out if pvc-xxxxxxx
is mapped a /dev/rbc?
on this node
kubectl -n rook-system logs <agent-pod>
# Look for pvc-xxxxxxxxxx and its rbd?
kubectl -n rook-system exec -it <agent-pod> -- rbd device list
# in the right node, check which /dev/rbd? is
mount | grep pvc-xxxxxxxxxx
Check if the device is mapped to the node (run in agent pod)
You should see
id pool namespace image snap device
0 replicapool pvc-xxxxxxxx-yyyy-zzzz-aaaa-tttttttttttt - /dev/rbd?
check file system type by
sudo file -sL /dev/rbd?
It will show ext4 filesystem
for ext4, XFS filesystem
for xfs
repair file system accordingly
fix rbd for xfs_repair
if xfs filesystem
# for xfs only
sudo umount /dev/rbd?
sudo xfs_repair /dev/rbd?
# if got error during xfs_repair, please add -L to enalbe Force Log Zeroing
sudo xfs_repair -L /dev/rbd?
fix rbd for fsck if ext4
filesystem
# for ext4 only
sudo e2fsck -y /dev/rbd?