Commit Graph

2359 Commits

Author SHA1 Message Date
Monis Khan
0d285ce993
Ensure concierge and supervisor gracefully exit
Changes made to both components:

1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever

Changes made to concierge:

1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook

Changes made to supervisor:

1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle

Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit.  This reduces
the time needed for the next leader to be elected.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-30 20:29:52 -04:00
Matt Moyer
e43bd59688
Merge pull request #830 from mattmoyer/update-youtube-demo-link
Update YouTube demo link to our official page.
2021-08-30 14:30:15 -07:00
Matt Moyer
0c8d885c26
Update YouTube demo link to our official page.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-08-30 16:29:32 -05:00
Mo Khan
d2dfe3634a
Merge pull request #828 from enj/enj/i/supervisor_graceful_exit
supervisor: ensure graceful exit
2021-08-28 13:40:13 -04:00
Monis Khan
5489f68e2f
supervisor: ensure graceful exit
The kubelet will send the SIGTERM signal when it wants a process to
exit.  After a grace period, it will send the SIGKILL signal to
force the process to terminate.  The concierge has always handled
both SIGINT and SIGTERM as indicators for it to gracefully exit
(i.e. stop watches, controllers, etc).  This change updates the
supervisor to do the same (previously it only handled SIGINT).  This
is required to allow the leader election lock release logic to run.
Otherwise it can take a few minutes for new pods to acquire the
lease since they believe it is already held.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-28 11:23:11 -04:00
Ryan Richard
4eb500cc41
Merge pull request #826 from vmware-tanzu/simplify_readme
Simplify the main README.md to reduce duplication with website
2021-08-27 16:40:53 -07:00
Ryan Richard
871a9fb0c6 Simplify the main README.md to reduce duplication with website 2021-08-27 15:52:51 -07:00
Mo Khan
d580695faa
Merge pull request #824 from enj/enj/t/disruptive_hang
test/integration: use short timeouts with distinct requests to prevent hangs
2021-08-27 16:38:39 -04:00
Monis Khan
ba80b691e1
test/integration: use short timeouts with distinct requests to prevent hangs
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 16:10:36 -04:00
Mo Khan
41c017c9da
Merge pull request #821 from enj/enj/t/increase_disruptive_test_timeout
test/integration: increase timeout on disruptive tests
2021-08-27 15:24:43 -04:00
Monis Khan
5078cdbc90
test/integration: increase timeout on disruptive tests
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 14:56:51 -04:00
Margo Crawford
e5718351ba
Merge pull request #695 from vmware-tanzu/active-directory-identity-provider
Active directory identity provider
2021-08-27 08:39:12 -07:00
Mo Khan
36ff0d52da
Merge pull request #818 from enj/enj/i/bump_go1.17
Bump to Go 1.17.0
2021-08-27 10:30:51 -04:00
Monis Khan
ad3086b8f1
Downgrade go mod compat to 1.16 for golangci-lint
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 10:03:48 -04:00
Monis Khan
6c29f347b4
go 1.17 bump: fix unit test failures
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 09:46:58 -04:00
Monis Khan
a86949d0be
Use go 1.17 module lazy loading
See https://golang.org/doc/go1.17#go-command for details.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 09:46:58 -04:00
Monis Khan
44f03af4b9
Bump to Go 1.17.0
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 09:00:49 -04:00
Mo Khan
ce5cfde11e
Merge pull request #816 from enj/enj/i/bump_1.22.1
Bump Kube to v0.22.1
2021-08-27 08:40:23 -04:00
Monis Khan
40d70bf1fc
Bump Kube to v0.22.1
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 07:36:12 -04:00
Margo Crawford
19100d68ef Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 20:42:16 -07:00
Mo Khan
1d44aa945d
Merge pull request #814 from mayankbh/topic/bmayank/inherit-hostnetwork
Allow use of hostNetwork for kube-cert-agent
2021-08-26 21:13:29 -04:00
Mayank Bhatt
68547f767d Copy hostNetwork field for kube-cert-agent
For clusters where the control plane nodes aren't running a CNI, the
kube-cert-agent pods deployed by concierge cannot be scheduled as they
don't know to use `hostNetwork: true`. This change allows embedding the
host network setting in the Concierge configuration. (by copying it from
the kube-controller-manager pod spec when generating the kube-cert-agent
Deployment)

Also fixed a stray double comma in one of the nearby tests.
2021-08-26 17:09:59 -07:00
Margo Crawford
43694777d5 Change some comments on API docs, fix lint error by ignoring it 2021-08-26 16:55:43 -07:00
Ryan Richard
f579b1cb9f
Merge pull request #812 from vmware-tanzu/resources_section_web_site
Add "Resources" section to pinniped.dev web site
2021-08-26 16:23:36 -07:00
Margo Crawford
2d32e0fa7d Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 16:21:08 -07:00
Margo Crawford
6f221678df Change sAMAccountName env vars to userPrincipalName
and add E2E ActiveDirectory test
also fixed regexes in supervisor_login_test to be anchored to the
beginning and end
2021-08-26 16:18:05 -07:00
Ryan Richard
e24040b0a9 add link to CNCF presentation slides 2021-08-26 15:52:04 -07:00
Mo Khan
1d269d2f6d
Merge pull request #815 from enj/enj/t/integration_parallel_disruptive
test/integration: mark certain tests as disruptive
2021-08-26 17:32:14 -04:00
Monis Khan
d4a7f0b3e1
test/integration: mark certain tests as disruptive
This prevents them from running with any other test, including other
parallel tests.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 15:11:47 -04:00
Mo Khan
d22099ac33
Merge pull request #808 from enj/enj/t/integration_parallel
test/integration: run parallel tests concurrently with serial tests
2021-08-26 14:34:18 -04:00
Monis Khan
e2cf9f6b74
leader election test: approximate that followers have observed change
Instead of blindly waiting long enough for a disruptive change to
have been observed by the old leader and followers, we instead rely
on the approximation that checkOnlyLeaderCanWrite provides - i.e.
only a single actor believes they are the leader.  This does not
account for clients that were in the followers list before and after
the disruptive change, but it serves as a reasonable approximation.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 12:59:52 -04:00
Monis Khan
74daa1da64
test/integration: run parallel tests concurrently with serial tests
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 12:59:52 -04:00
Ryan Richard
475da05185
Merge pull request #810 from vmware-tanzu/docs_gitops_example
Install docs use more GitOps-friendly style
2021-08-25 16:46:58 -07:00
Ryan Richard
86bfd4f5e4 Number each install step using "1." 2021-08-25 16:37:36 -07:00
Ryan Richard
d453bf3403 Add "Resources" section to pinniped.dev web site 2021-08-25 16:25:53 -07:00
Mo Khan
2b9b034bd2
Merge pull request #811 from vmware-tanzu/test_shell_container_image
Replace one-off usages of busybox and debian images in integration tests
2021-08-25 19:13:13 -04:00
Ryan Richard
d20cab10b9 Replace one-off usages of busybox and debian images in integration tests
Those images that are pulled from Dockerhub will cause pull failures
on some test clusters due to Dockerhub rate limiting.

Because we already have some images that we use for testing, and
because those images are already pre-loaded onto our CI clusters
to make the tests faster, use one of those images and always specify
PullIfNotPresent to avoid pulling the image again during the integration
test.
2021-08-25 15:12:07 -07:00
Ryan Richard
399737e7c6 Install docs use more GitOps-friendly style 2021-08-25 14:33:48 -07:00
Margo Crawford
1c5a2b8892 Add a couple more unit tests 2021-08-25 11:33:42 -07:00
Mo Khan
c17e7bec49
Merge pull request #800 from enj/enj/i/leader_election_release
leader election: fix small race duration lease release
2021-08-25 10:29:19 -04:00
Monis Khan
c71ffdcd1e
leader election: use better duration defaults
OpenShift has good defaults for these duration fields that we can
use instead of coming up with them ourselves:

e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)

Copied here for easy future reference:

// We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts.
// We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts.
// We want a single set of guidance for nearly every lease in openshift.  If you're special, we'll let you know.
// 1. clock skew tolerance is leaseDuration-renewDeadline == 30s
// 2. kube-apiserver downtime tolerance is == 78s
//      lastRetry=floor(renewDeadline/retryPeriod)*retryPeriod == 104
//      downtimeTolerance = lastRetry-retryPeriod == 78s
// 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s
// 4. worst graceful lease acquisition is retryPeriod == 26s
if ret.LeaseDuration.Duration == 0 {
	ret.LeaseDuration.Duration = 137 * time.Second
}

if ret.RenewDeadline.Duration == 0 {
	// this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew
	// if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew),
	// then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will
	// be 33 seconds of extra time before the lease is lost.
	ret.RenewDeadline.Duration = 107 * time.Second
}
if ret.RetryPeriod.Duration == 0 {
	ret.RetryPeriod.Duration = 26 * time.Second
}

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-24 16:21:53 -04:00
Margo Crawford
c590c8ff41 Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-24 12:19:29 -07:00
Monis Khan
c0617ceda4
leader election: in-memory leader status is stopped before release
This change fixes a small race condition that occurred when the
current leader failed to renew its lease.  Before this change, the
leader would first release the lease via the Kube API and then would
update its in-memory status to reflect that change.  Now those
events occur in the reverse (i.e. correct) order.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-24 15:02:56 -04:00
Mo Khan
f7751d13fe
Merge pull request #778 from vmware-tanzu/oidc_password_grant
Optionally allow OIDC password grant for CLI-based login experience
2021-08-24 13:02:07 -04:00
Mo Khan
3077034b2d
Merge branch 'main' into oidc_password_grant 2021-08-24 12:23:52 -04:00
Mo Khan
89cef2ea6c
Merge pull request #796 from enj/enj/i/leader_election_flake
leader election test: fix flake related to invalid assumption
2021-08-20 19:06:51 -04:00
Ryan Richard
211f4b23d1 Log auth endpoint errors with stack traces 2021-08-20 14:41:02 -07:00
Monis Khan
132ec0d2ad
leader election test: fix flake related to invalid assumption
Even though a client may hold the leader election lock in the Kube
lease API, that does not mean it has had a chance to update its
internal state to reflect that.  Thus we retry the checks in
checkOnlyLeaderCanWrite a few times to allow the client to catch up.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-20 17:04:26 -04:00
Mo Khan
ae505d8009
Merge pull request #788 from enj/enj/i/leader_election
Add Leader Election Middleware
2021-08-20 12:58:27 -04:00
Monis Khan
c356710f1f
Add leader election middleware
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-20 12:18:25 -04:00