rook-ceph を使っていて pvc が動かないケースがあった。エラー内容は次の通り。

Events:
  Type     Reason                Age                   From                                                                                                        Message
  ----     ------                ----                  ----                                                                                                        -------
  Normal   ExternalProvisioning  12s (x11 over 2m37s)  persistentvolume-controller                                                                                 waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Warning  ProvisioningFailed    7s                    rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-6ljd7_3e85cbaa-2c5a-4f72-abb2-fa3503746d75  failed to provision volume with StorageClass "rook-ceph-block": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   Provisioning          6s (x3 over 2m37s)    rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-6ljd7_3e85cbaa-2c5a-4f72-abb2-fa3503746d75  External provisioner is provisioning volume for claim "splunk/splunk-search-data"
  Warning  ProvisioningFailed    5s (x2 over 6s)       rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-6ljd7_3e85cbaa-2c5a-4f72-abb2-fa3503746d75  failed to provision volume with StorageClass "rook-ceph-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-6fcf747c-a5f7-4028-ac6a-a65101c89c04 already exists

https://github.com/rook/rook/issues/4896 を眺めると、まずは rbd と名のつくプロセスが動いているか確認しろとあったので、確認するとなかった。

kubectl exec -it csi-rbdplugin-t7ctk -c csi-rbdplugin -- ps -ef | grep rbd

kubectl exec -it csi-rbdplugin-t7ctk -c csi-rbdplugin -- rbd ls
unable to get monitor info from DNS SRV with service name: ceph-mon
rbd: couldn't connect to the cluster!
rbd: listing images failed: 2021-09-11T10:06:06.002+0000 7f99210632c0 -1 failed for service _ceph-mon._tcp
(2) No such file or directory
2021-09-11T10:06:06.002+0000 7f99210632c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
command terminated with exit code 2

とりあえず Pod を再作成することで直った。

kubectl get pods | grep csi-rbd | awk '{print $1}' | xargs kubectl delete pod

個人で使っているクラスタだから、ちゃんと rook / ceph を学んでいないので場当たり的な対応止まり…