Troubleshooting Kube-OVN
In complex cases, you may encounter issues where the Kube-OVN DaemonSet pods crash or fail to start properly. This usually indicates a corrupted OVN database. You can confirm this by checking the logs of the Kube-OVN CNI pods.
Get the list of pods in cozy-kubeovn
namespace:
# kubectl get pod -n cozy-kubeovn
NAME READY STATUS RESTARTS AGE
kube-ovn-cni-5rsvz 0/1 Running 5 (35s ago) 4m37s
kube-ovn-cni-jq2zz 0/1 Running 5 (33s ago) 4m39s
kube-ovn-cni-p4gz2 0/1 Running 3 (23s ago) 4m38s
Read the logs of a pod by its name (kube-ovn-cni-jq2zz
in this example):
# kubectl logs -n cozy-kubeovn kube-ovn-cni-jq2zz
W0725 08:21:12.479452 87678 ovs.go:35] 100.64.0.4 network not ready after 3 ping to gateway 100.64.0.1
W0725 08:21:15.479600 87678 ovs.go:35] 100.64.0.4 network not ready after 6 ping to gateway 100.64.0.1
W0725 08:21:18.479628 87678 ovs.go:35] 100.64.0.4 network not ready after 9 ping to gateway 100.64.0.1
W0725 08:21:21.479355 87678 ovs.go:35] 100.64.0.4 network not ready after 12 ping to gateway 100.64.0.1
W0725 08:21:24.479322 87678 ovs.go:35] 100.64.0.4 network not ready after 15 ping to gateway 100.64.0.1
W0725 08:21:27.479664 87678 ovs.go:35] 100.64.0.4 network not ready after 18 ping to gateway 100.64.0.1
W0725 08:21:30.478907 87678 ovs.go:35] 100.64.0.4 network not ready after 21 ping to gateway 100.64.0.1
W0725 08:21:33.479738 87678 ovs.go:35] 100.64.0.4 network not ready after 24 ping to gateway 100.64.0.1
W0725 08:21:36.479607 87678 ovs.go:35] 100.64.0.4 network not ready after 27 ping to gateway 100.64.0.1
W0725 08:21:39.479753 87678 ovs.go:35] 100.64.0.4 network not ready after 30 ping to gateway 100.64.0.1
W0725 08:21:42.479480 87678 ovs.go:35] 100.64.0.4 network not ready after 33 ping to gateway 100.64.0.1
W0725 08:21:45.478754 87678 ovs.go:35] 100.64.0.4 network not ready after 36 ping to gateway 100.64.0.1
W0725 08:21:48.479396 87678 ovs.go:35] 100.64.0.4 network not ready after 39 ping to gateway 100.64.0.1
To resolve this issue, you can clean up the OVN database. This involves running a DaemonSet that removes the OVN configuration files from each node. It is safe to perform this cleanup — the Kube-OVN DaemonSet will automatically recreate the necessary files from the Kubernetes API.
Apply the following YAML to deploy the cleanup DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ovn-cleanup
namespace: cozy-kubeovn
spec:
selector:
matchLabels:
app: ovn-cleanup
template:
metadata:
labels:
app: ovn-cleanup
component: network
type: infra
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: ovn-central
topologyKey: "kubernetes.io/hostname"
containers:
- name: cleanup
image: busybox
command: ["/bin/sh", "-xc", "rm -rf /host-config-ovn/*; rm -rf /host-config-ovn/.*; exec sleep infinity"]
volumeMounts:
- name: host-config-ovn
mountPath: /host-config-ovn
nodeSelector:
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: ""
tolerations:
- operator: "Exists"
volumes:
- name: host-config-ovn
hostPath:
path: /var/lib/ovn
type: ""
hostNetwork: true
restartPolicy: Always
terminationGracePeriodSeconds: 1
Verify that the DaemonSet is running:
# kubectl get pod -n cozy-kubeovn
ovn-cleanup-hjzxb 1/1 Running 0 6s
ovn-cleanup-wmzdv 1/1 Running 0 6s
ovn-cleanup-ztm86 1/1 Running 0 6s
Once the cleanup is complete, delete the ovn-cleanup
DaemonSet and restart the Kube-OVN CNI pods to apply the new configuration:
# Delete the cleanup DaemonSet
kubectl -n cozy-kubeovn delete ds ovn-cleanup
# Restart Kube-OVN CNI pods by deleting them
kubectl -n cozy-kubeovn rollout restart ds kube-ovn-cni