rook-ceph を使っていて pvc が動かないケースがあった。エラー内容は次の通り。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 12s (x11 over 2m37s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
Warning ProvisioningFailed 7s rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-6ljd7_3e85cbaa-2c5a-4f72-abb2-fa3503746d75 failed to provision volume with StorageClass "rook-ceph-block": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal Provisioning 6s (x3 over 2m37s) rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-6ljd7_3e85cbaa-2c5a-4f72-abb2-fa3503746d75 External provisioner is provisioning volume for claim "splunk/splunk-search-data"
Warning ProvisioningFailed 5s (x2 over 6s) rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6799bd4cb7-6ljd7_3e85cbaa-2c5a-4f72-abb2-fa3503746d75 failed to provision volume with StorageClass "rook-ceph-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-6fcf747c-a5f7-4028-ac6a-a65101c89c04 already exists
https://github.com/rook/rook/issues/4896 を眺めると、まずは rbd と名のつくプロセスが動いているか確認しろとあったので、確認するとなかった。
kubectl exec -it csi-rbdplugin-t7ctk -c csi-rbdplugin -- ps -ef | grep rbd
kubectl exec -it csi-rbdplugin-t7ctk -c csi-rbdplugin -- rbd ls
unable to get monitor info from DNS SRV with service name: ceph-mon
rbd: couldn't connect to the cluster!
rbd: listing images failed: 2021-09-11T10:06:06.002+0000 7f99210632c0 -1 failed for service _ceph-mon._tcp
(2) No such file or directory
2021-09-11T10:06:06.002+0000 7f99210632c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
command terminated with exit code 2
とりあえず Pod を再作成することで直った。
kubectl get pods | grep csi-rbd | awk '{print $1}' | xargs kubectl delete pod
個人で使っているクラスタだから、ちゃんと rook / ceph を学んでいないので場当たり的な対応止まり…