Troubleshooting LINSTOR controller crash loops
Restarting the controller
If the controller is running but appears idle/unresponsive, try restarting it. This operation is idempotent and safe: pending (unfinished) work will resume after the restart.
LINSTOR controller crash loop
If linstor-controller can’t start, but logs do not contain any useful information, you can increase the log level
(maximum level is TRACE
).
Example of LINSTORCluster
CR with increased log level:
apiVersion: piraeus.io/v1
kind: LINSTORCluster
spec:
controller:
podTemplate:
spec:
containers:
- name: linstor-controller
env:
# both settings are used by linstor-controller
- name: LS_LOG_LEVEL
value: TRACE
- name: LS_LOG_LEVEL_LINSTOR
value: TRACE
Note: if linstor-controller is not in a crash loop, but you need to increase log level, you can do so temporarily in the runtime using the following command:
linstor controller set-log-level --global TRACE
This setting will be reset to initial value when the controller restarts.
LINSTOR plays dead after certificate expiration
If you had configured LINSTOR with internal TLS communication, certificates will be created and rotated automatically. But there is an open issue piraeusdatastore/piraeus-operator#701 about components not picking up new certificates after rotation. The workaround is to restart all LINSTOR components manually.
Follow these steps in order:
- Restart the
linstor-controller
. - Restart each satellite one by one. Do not restart them all at once. After each satellite restart, check its logs for errors before proceeding to the next one.
- Restart the
linstor-controller
again. This is necessary because the controller also initiates connections to satellites and may not automatically reconnect to a satellite that has been restarted. - Restart all remaining LINSTOR components.