Troubleshooting cheatsheet
Cozystack
Getting basic information
You can see the logs of main installer by executing:
kubectl logs -n cozy-system deploy/cozystack -f
All the platform components are installed using fluxcd HelmRelases.
You can get all installed HelmRelases:
# kubectl get hr -A
NAMESPACE NAME AGE READY STATUS
cozy-cert-manager cert-manager 4m1s True Release reconciliation succeeded
cozy-cert-manager cert-manager-issuers 4m1s True Release reconciliation succeeded
cozy-cilium cilium 4m1s True Release reconciliation succeeded
cozy-cluster-api capi-operator 4m1s True Release reconciliation succeeded
cozy-cluster-api capi-providers 4m1s True Release reconciliation succeeded
cozy-dashboard dashboard 4m1s True Release reconciliation succeeded
cozy-fluxcd cozy-fluxcd 4m1s True Release reconciliation succeeded
cozy-grafana-operator grafana-operator 4m1s True Release reconciliation succeeded
cozy-kamaji kamaji 4m1s True Release reconciliation succeeded
cozy-kubeovn kubeovn 4m1s True Release reconciliation succeeded
cozy-kubevirt-cdi kubevirt-cdi 4m1s True Release reconciliation succeeded
cozy-kubevirt-cdi kubevirt-cdi-operator 4m1s True Release reconciliation succeeded
cozy-kubevirt kubevirt 4m1s True Release reconciliation succeeded
cozy-kubevirt kubevirt-operator 4m1s True Release reconciliation succeeded
cozy-linstor linstor 4m1s True Release reconciliation succeeded
cozy-linstor piraeus-operator 4m1s True Release reconciliation succeeded
cozy-mariadb-operator mariadb-operator 4m1s True Release reconciliation succeeded
cozy-metallb metallb 4m1s True Release reconciliation succeeded
cozy-monitoring monitoring 4m1s True Release reconciliation succeeded
cozy-postgres-operator postgres-operator 4m1s True Release reconciliation succeeded
cozy-rabbitmq-operator rabbitmq-operator 4m1s True Release reconciliation succeeded
cozy-redis-operator redis-operator 4m1s True Release reconciliation succeeded
cozy-telepresence telepresence 4m1s True Release reconciliation succeeded
cozy-victoria-metrics-operator victoria-metrics-operator 4m1s True Release reconciliation succeeded
tenant-root tenant-root 4m1s True Release reconciliation succeeded
Normaly all of them should be Ready
and Release reconciliation succeeded
Diagnosing install retries exhausted
error
Sometimes you can face with the error:
# kubectl get hr -A -n cozy-dashboard dashboard
NAMESPACE NAME AGE READY STATUS
cozy-dashboard dashboard 15m False install retries exhausted
You can try to figure out by checking events:
kubectl describe hr -n cozy-dashboard dashboard
if Events: <none>
then suspend and reume the release:
kubectl patch hr -n cozy-dashboard dashboard -p '{"spec": {"suspend": true}}' --type=merge
kubectl patch hr -n cozy-dashboard dashboard -p '{"spec": {"suspend": null}}' --type=merge
and check the events again:
kubectl describe hr -n cozy-dashboard dashboard
talos-bootstrap
No Talos nodes in maintenance mode found!
If you encounter issues with the talos-bootstrap
script not detecting any nodes, follow these steps to diagnose and resolve the issue:
Verify Network Segment
Ensure that you are running the script within the same network segment as the nodes. This is crucial for the script to be able to communicate with the nodes.
Use Nmap to Discover Nodes
Check if nmap
can discover your node by running the following command:
nmap -Pn -n -p 50000 192.168.0.0/24
This command scans for nodes in the network that are listening on port 50000. The output should list all the nodes in the network segment that are listening on this port, indicating that they are reachable.
Verify talosctl Connectivity
Next, verify if talosctl can connect to a specific node, especially if it is in maintenance mode, using the command below:
talosctl -e "${node}" -n "${node}" get machinestatus -i
If you encounter errors similar to:
rpc error: code = Unimplemented desc = unknown service resource.ResourceService
This indicates that your version of talosctl
is outdated. Updating talosctl
to the latest version should resolve this issue.
Enable Debug Mode for talos-bootstrap
If the above steps do not help, you can debug the talos-bootstrap
script by running it in debug mode. This can provide more insights into what might be going wrong. Execute the script with the -x
option to enable debug mode:
bash -x talos-bootstrap
Pay attention to the last command displayed before the error; it often indicates the command that failed and can provide clues for further troubleshooting.