Troubleshooting Kube-OVN
Getting information about Kube-OVN database state
# Northbound DB
kubectl -n cozy-kubeovn exec deploy/ovn-central -c ovn-central -- ovn-appctl \
-t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
# Southbound DB
kubectl -n cozy-kubeovn exec deploy/ovn-central -c ovn-central -- ovn-appctl \
-t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
Example output:
Name: OVN_Northbound
Cluster ID: abf6 (abf66f15-9382-4b2b-b14c-355d64ae1bda)
Server ID: 8d8a (8d8a2985-c444-43bb-99f6-21c82f05b58d)
Address: ssl:[10.200.1.22]:6643
Status: cluster member
Role: leader
Term: 3
Leader: self
Vote: self
Last Election started 146211 ms ago, reason: leadership_transfer
Last Election won: 146202 ms ago
Election timer: 5000
Log: [2, 1569]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->2a66 ->c23f <-2a66 <-c23f
Disconnections: 30
Servers:
2a66 (2a66 at ssl:[10.200.1.18]:6643) next_index=1569 match_index=1568 last msg 17 ms ago
8d8a (8d8a at ssl:[10.200.1.22]:6643) (self) next_index=1471 match_index=1568
c23f (c23f at ssl:[10.200.1.1]:6643) next_index=1569 match_index=1568 last msg 18 ms ago
To kick a node out of the cluster (for example, if it is down, and you want to remove it from the cluster), use:
# Northbound DB
kubectl -n cozy-kubeovn exec deploy/ovn-central -c ovn-central -- ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/kick OVN_Northbound <server-id>
# Southbound DB
kubectl -n cozy-kubeovn exec deploy/ovn-central -c ovn-central -- ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/kick OVN_Southbound <server-id>
Resolving Kube-OVN Pods Crashing
In complex cases, you may encounter issues where the Kube-OVN DaemonSet pods crash or fail to start properly. This usually indicates a corrupted OVN database. You can confirm this by checking the logs of the Kube-OVN CNI pods.
Get the list of pods in cozy-kubeovn
namespace:
# kubectl get pod -n cozy-kubeovn
NAME READY STATUS RESTARTS AGE
kube-ovn-cni-5rsvz 0/1 Running 5 (35s ago) 4m37s
kube-ovn-cni-jq2zz 0/1 Running 5 (33s ago) 4m39s
kube-ovn-cni-p4gz2 0/1 Running 3 (23s ago) 4m38s
Read the logs of a pod by its name (kube-ovn-cni-jq2zz
in this example):
# kubectl logs -n cozy-kubeovn kube-ovn-cni-jq2zz
W0725 08:21:12.479452 87678 ovs.go:35] 100.64.0.4 network not ready after 3 ping to gateway 100.64.0.1
W0725 08:21:15.479600 87678 ovs.go:35] 100.64.0.4 network not ready after 6 ping to gateway 100.64.0.1
W0725 08:21:18.479628 87678 ovs.go:35] 100.64.0.4 network not ready after 9 ping to gateway 100.64.0.1
W0725 08:21:21.479355 87678 ovs.go:35] 100.64.0.4 network not ready after 12 ping to gateway 100.64.0.1
W0725 08:21:24.479322 87678 ovs.go:35] 100.64.0.4 network not ready after 15 ping to gateway 100.64.0.1
W0725 08:21:27.479664 87678 ovs.go:35] 100.64.0.4 network not ready after 18 ping to gateway 100.64.0.1
W0725 08:21:30.478907 87678 ovs.go:35] 100.64.0.4 network not ready after 21 ping to gateway 100.64.0.1
W0725 08:21:33.479738 87678 ovs.go:35] 100.64.0.4 network not ready after 24 ping to gateway 100.64.0.1
W0725 08:21:36.479607 87678 ovs.go:35] 100.64.0.4 network not ready after 27 ping to gateway 100.64.0.1
W0725 08:21:39.479753 87678 ovs.go:35] 100.64.0.4 network not ready after 30 ping to gateway 100.64.0.1
W0725 08:21:42.479480 87678 ovs.go:35] 100.64.0.4 network not ready after 33 ping to gateway 100.64.0.1
W0725 08:21:45.478754 87678 ovs.go:35] 100.64.0.4 network not ready after 36 ping to gateway 100.64.0.1
W0725 08:21:48.479396 87678 ovs.go:35] 100.64.0.4 network not ready after 39 ping to gateway 100.64.0.1
To resolve this issue, you can clean up the OVN database. This involves running a DaemonSet that removes the OVN configuration files from each node. It is safe to perform this cleanup — the Kube-OVN DaemonSet will automatically recreate the necessary files from the Kubernetes API.
Apply the following YAML to deploy the cleanup DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ovn-cleanup
namespace: cozy-kubeovn
spec:
selector:
matchLabels:
app: ovn-cleanup
template:
metadata:
labels:
app: ovn-cleanup
component: network
type: infra
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: ovn-central
topologyKey: "kubernetes.io/hostname"
containers:
- name: cleanup
image: busybox
command: ["/bin/sh", "-xc", "rm -rf /host-config-ovn/*; rm -rf /host-config-ovn/.*; exec sleep infinity"]
volumeMounts:
- name: host-config-ovn
mountPath: /host-config-ovn
nodeSelector:
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: ""
tolerations:
- operator: "Exists"
volumes:
- name: host-config-ovn
hostPath:
path: /var/lib/ovn
type: ""
hostNetwork: true
restartPolicy: Always
terminationGracePeriodSeconds: 1
Verify that the DaemonSet is running:
# kubectl get pod -n cozy-kubeovn
ovn-cleanup-hjzxb 1/1 Running 0 6s
ovn-cleanup-wmzdv 1/1 Running 0 6s
ovn-cleanup-ztm86 1/1 Running 0 6s
Once the cleanup is complete, delete the ovn-cleanup
DaemonSet and restart the Kube-OVN CNI pods to apply the new configuration:
# Delete the cleanup DaemonSet
kubectl -n cozy-kubeovn delete ds ovn-cleanup
# Restart Kube-OVN pods by deleting them
kubectl -n cozy-kubeovn delete pod -l app!=ovs