ContainerImage.Pinniped

Author	SHA1	Message	Date
Monis Khan	e2cf9f6b74	leader election test: approximate that followers have observed change Instead of blindly waiting long enough for a disruptive change to have been observed by the old leader and followers, we instead rely on the approximation that checkOnlyLeaderCanWrite provides - i.e. only a single actor believes they are the leader. This does not account for clients that were in the followers list before and after the disruptive change, but it serves as a reasonable approximation. Signed-off-by: Monis Khan <mok@vmware.com>	2021-08-26 12:59:52 -04:00
Monis Khan	74daa1da64	test/integration: run parallel tests concurrently with serial tests Signed-off-by: Monis Khan <mok@vmware.com>	2021-08-26 12:59:52 -04:00
Monis Khan	c71ffdcd1e	leader election: use better duration defaults OpenShift has good defaults for these duration fields that we can use instead of coming up with them ourselves: `e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)` Copied here for easy future reference: // We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts. // We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts. // We want a single set of guidance for nearly every lease in openshift. If you're special, we'll let you know. // 1. clock skew tolerance is leaseDuration-renewDeadline == 30s // 2. kube-apiserver downtime tolerance is == 78s // lastRetry=floor(renewDeadline/retryPeriod)retryPeriod == 104 // downtimeTolerance = lastRetry-retryPeriod == 78s // 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s // 4. worst graceful lease acquisition is retryPeriod == 26s if ret.LeaseDuration.Duration == 0 { ret.LeaseDuration.Duration = 137 time.Second } if ret.RenewDeadline.Duration == 0 { // this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew // if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew), // then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will // be 33 seconds of extra time before the lease is lost. ret.RenewDeadline.Duration = 107 * time.Second } if ret.RetryPeriod.Duration == 0 { ret.RetryPeriod.Duration = 26 * time.Second } Signed-off-by: Monis Khan <mok@vmware.com>	2021-08-24 16:21:53 -04:00
Monis Khan	c0617ceda4	leader election: in-memory leader status is stopped before release This change fixes a small race condition that occurred when the current leader failed to renew its lease. Before this change, the leader would first release the lease via the Kube API and then would update its in-memory status to reflect that change. Now those events occur in the reverse (i.e. correct) order. Signed-off-by: Monis Khan <mok@vmware.com>	2021-08-24 15:02:56 -04:00
Monis Khan	132ec0d2ad	leader election test: fix flake related to invalid assumption Even though a client may hold the leader election lock in the Kube lease API, that does not mean it has had a chance to update its internal state to reflect that. Thus we retry the checks in checkOnlyLeaderCanWrite a few times to allow the client to catch up. Signed-off-by: Monis Khan <mok@vmware.com>	2021-08-20 17:04:26 -04:00
Monis Khan	c356710f1f	Add leader election middleware Signed-off-by: Monis Khan <mok@vmware.com>	2021-08-20 12:18:25 -04:00

6 Commits