ca6c29e463
Before this fix, the deadlock would prevent the leader pod from giving up its lease, which would make it take several minutes for new pods to be allowed to elect a new leader. During that time, no Pinniped controllers could write to the Kube API, so important resources were not being updated during that window. It would also make pod shutdown take about 1 minute. After this fix, the leader gives up its lease immediately, and pod shutdown takes about 1 second. This improves restart/upgrade time and also fixes the problem where there was no leader for several minutes after a restart/upgrade. The deadlock was between the post-start hook and the pre-shutdown hook. The pre-shutdown hook blocked until a certain background goroutine in the post-start hook finished, but that goroutine could not finish until the pre-shutdown hook finished. Thus, they were both blocked, waiting for each other infinitely. Eventually the process would be externally killed. This deadlock was most likely introduced by some change in Kube's generic api server package related to how the many complex channels used during server shutdown interact with each other, and was not noticed when we upgraded to the version which introduced the change. |
||
---|---|---|
.. | ||
leaderelection_test.go | ||
leaderelection.go |