Symptoms

When user Notebook fails to launch, from Event log of Notebook spawning page, it describes

failed: Structure needs cleaning , which means the file system of a PVC is corrupted, it requires xfs_repair.

Diagnosing the problem

To learn which /dev/rbd? and which pvc-xxxxxxxxxx From Event log.

Resolving the problem

  1. To learn which node user pod is assigned to; if the user pod has been recycled, launch it again.

    kubectl get -n hub pod -o wide | grep jupyter-xxx
    

    Find the corresponding <rook-agent-pod> according to node where user pod locates.

    # Find the agent-pod on the specific node
    kubectl -n rook-system get pods -o wide | grep <node_name>
    
  2. ssh to the node

    ssh <node>
    
  3. Find out if pvc-xxxxxxx is mapped a /dev/rbc? on this node

    kubectl -n rook-system logs <agent-pod>
    
    # Look for pvc-xxxxxxxxxx and its rbd?
    kubectl -n rook-system exec -it <agent-pod> -- rbd device list
    
    # in the right node, check which /dev/rbd? is
    mount | grep pvc-xxxxxxxxxx
    

    Check if the device is mapped to the node (run in agent pod)

    You should see

    id pool namespace     image                                    snap device
    	 0    replicapool   pvc-xxxxxxxx-yyyy-zzzz-aaaa-tttttttttttt -    /dev/rbd?
    
  4. check file system type by

    sudo file -sL /dev/rbd?
    

    It will show ext4 filesystem for ext4, XFS filesystem for xfs

  5. repair file system accordingly