Commit Graph

2470 Commits

Author SHA1 Message Date
Ryan Richard b19af2e135
Merge pull request #829 from enj/enj/i/wait_shutdown
Ensure concierge and supervisor gracefully exit
2021-08-31 11:30:35 -07:00
Ryan Richard 883007aa1b
Merge pull request #756 from vmware-tanzu/ad-identity-provider-docs
Document how to configure the ActiveDirectoryIdentityProvider
2021-08-31 10:48:25 -07:00
Anjali Telang ba1470ea9d Add AD changes
Signed-off-by: Anjali Telang <atelang@vmware.com>
2021-08-30 21:04:48 -04:00
Monis Khan 0d285ce993
Ensure concierge and supervisor gracefully exit
Changes made to both components:

1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever

Changes made to concierge:

1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook

Changes made to supervisor:

1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle

Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit.  This reduces
the time needed for the next leader to be elected.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-30 20:29:52 -04:00
Matt Moyer e43bd59688
Merge pull request #830 from mattmoyer/update-youtube-demo-link
Update YouTube demo link to our official page.
2021-08-30 14:30:15 -07:00
Matt Moyer 0c8d885c26
Update YouTube demo link to our official page.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-08-30 16:29:32 -05:00
Anjali Telang 23fb84029b changes made on ryan's review comments
Signed-off-by: Anjali Telang <atelang@vmware.com>
2021-08-28 15:59:04 -04:00
Mo Khan d2dfe3634a
Merge pull request #828 from enj/enj/i/supervisor_graceful_exit
supervisor: ensure graceful exit
2021-08-28 13:40:13 -04:00
Monis Khan 5489f68e2f
supervisor: ensure graceful exit
The kubelet will send the SIGTERM signal when it wants a process to
exit.  After a grace period, it will send the SIGKILL signal to
force the process to terminate.  The concierge has always handled
both SIGINT and SIGTERM as indicators for it to gracefully exit
(i.e. stop watches, controllers, etc).  This change updates the
supervisor to do the same (previously it only handled SIGINT).  This
is required to allow the leader election lock release logic to run.
Otherwise it can take a few minutes for new pods to acquire the
lease since they believe it is already held.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-28 11:23:11 -04:00
Ryan Richard 4eb500cc41
Merge pull request #826 from vmware-tanzu/simplify_readme
Simplify the main README.md to reduce duplication with website
2021-08-27 16:40:53 -07:00
Ryan Richard 871a9fb0c6 Simplify the main README.md to reduce duplication with website 2021-08-27 15:52:51 -07:00
Anjali Telang 4cb0152ea1 Merge branch 'main' of github.com:anjaltelang/pinniped into main 2021-08-27 17:15:55 -04:00
Anjali Telang 42af8acd1e Fixed yaml format for Aud
Signed-off-by: Anjali Telang <atelang@vmware.com>
2021-08-27 17:14:53 -04:00
Anjali Telang df014dadc3 Remove unnecessary space after image 2021-08-27 17:07:02 -04:00
Anjali Telang bb657e7432 Blog for v0.11.0
Signed-off-by: Anjali Telang <atelang@vmware.com>
2021-08-27 17:00:34 -04:00
Mo Khan d580695faa
Merge pull request #824 from enj/enj/t/disruptive_hang
test/integration: use short timeouts with distinct requests to prevent hangs
2021-08-27 16:38:39 -04:00
Monis Khan ba80b691e1
test/integration: use short timeouts with distinct requests to prevent hangs
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 16:10:36 -04:00
Mo Khan 41c017c9da
Merge pull request #821 from enj/enj/t/increase_disruptive_test_timeout
test/integration: increase timeout on disruptive tests
2021-08-27 15:24:43 -04:00
Monis Khan 5078cdbc90
test/integration: increase timeout on disruptive tests
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 14:56:51 -04:00
Margo Crawford e5718351ba
Merge pull request #695 from vmware-tanzu/active-directory-identity-provider
Active directory identity provider
2021-08-27 08:39:12 -07:00
Mo Khan 36ff0d52da
Merge pull request #818 from enj/enj/i/bump_go1.17
Bump to Go 1.17.0
2021-08-27 10:30:51 -04:00
Monis Khan ad3086b8f1
Downgrade go mod compat to 1.16 for golangci-lint
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 10:03:48 -04:00
Monis Khan 6c29f347b4
go 1.17 bump: fix unit test failures
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 09:46:58 -04:00
Monis Khan a86949d0be
Use go 1.17 module lazy loading
See https://golang.org/doc/go1.17#go-command for details.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 09:46:58 -04:00
Monis Khan 44f03af4b9
Bump to Go 1.17.0
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 09:00:49 -04:00
Mo Khan ce5cfde11e
Merge pull request #816 from enj/enj/i/bump_1.22.1
Bump Kube to v0.22.1
2021-08-27 08:40:23 -04:00
Monis Khan 40d70bf1fc
Bump Kube to v0.22.1
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 07:36:12 -04:00
Margo Crawford 19100d68ef Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 20:42:16 -07:00
Mo Khan 1d44aa945d
Merge pull request #814 from mayankbh/topic/bmayank/inherit-hostnetwork
Allow use of hostNetwork for kube-cert-agent
2021-08-26 21:13:29 -04:00
Mayank Bhatt 68547f767d Copy hostNetwork field for kube-cert-agent
For clusters where the control plane nodes aren't running a CNI, the
kube-cert-agent pods deployed by concierge cannot be scheduled as they
don't know to use `hostNetwork: true`. This change allows embedding the
host network setting in the Concierge configuration. (by copying it from
the kube-controller-manager pod spec when generating the kube-cert-agent
Deployment)

Also fixed a stray double comma in one of the nearby tests.
2021-08-26 17:09:59 -07:00
Margo Crawford 44e5e9d8c9 Add sentence about api docs 2021-08-26 17:02:56 -07:00
Margo Crawford 43694777d5 Change some comments on API docs, fix lint error by ignoring it 2021-08-26 16:55:43 -07:00
Ryan Richard f579b1cb9f
Merge pull request #812 from vmware-tanzu/resources_section_web_site
Add "Resources" section to pinniped.dev web site
2021-08-26 16:23:36 -07:00
Margo Crawford 2d32e0fa7d Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 16:21:08 -07:00
Margo Crawford 6f221678df Change sAMAccountName env vars to userPrincipalName
and add E2E ActiveDirectory test
also fixed regexes in supervisor_login_test to be anchored to the
beginning and end
2021-08-26 16:18:05 -07:00
Ryan Richard e24040b0a9 add link to CNCF presentation slides 2021-08-26 15:52:04 -07:00
Mo Khan 1d269d2f6d
Merge pull request #815 from enj/enj/t/integration_parallel_disruptive
test/integration: mark certain tests as disruptive
2021-08-26 17:32:14 -04:00
Monis Khan d4a7f0b3e1
test/integration: mark certain tests as disruptive
This prevents them from running with any other test, including other
parallel tests.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 15:11:47 -04:00
Mo Khan d22099ac33
Merge pull request #808 from enj/enj/t/integration_parallel
test/integration: run parallel tests concurrently with serial tests
2021-08-26 14:34:18 -04:00
Monis Khan e2cf9f6b74
leader election test: approximate that followers have observed change
Instead of blindly waiting long enough for a disruptive change to
have been observed by the old leader and followers, we instead rely
on the approximation that checkOnlyLeaderCanWrite provides - i.e.
only a single actor believes they are the leader.  This does not
account for clients that were in the followers list before and after
the disruptive change, but it serves as a reasonable approximation.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 12:59:52 -04:00
Monis Khan 74daa1da64
test/integration: run parallel tests concurrently with serial tests
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 12:59:52 -04:00
Ryan Richard 475da05185
Merge pull request #810 from vmware-tanzu/docs_gitops_example
Install docs use more GitOps-friendly style
2021-08-25 16:46:58 -07:00
Ryan Richard 86bfd4f5e4 Number each install step using "1." 2021-08-25 16:37:36 -07:00
Ryan Richard d453bf3403 Add "Resources" section to pinniped.dev web site 2021-08-25 16:25:53 -07:00
Mo Khan 2b9b034bd2
Merge pull request #811 from vmware-tanzu/test_shell_container_image
Replace one-off usages of busybox and debian images in integration tests
2021-08-25 19:13:13 -04:00
Ryan Richard d20cab10b9 Replace one-off usages of busybox and debian images in integration tests
Those images that are pulled from Dockerhub will cause pull failures
on some test clusters due to Dockerhub rate limiting.

Because we already have some images that we use for testing, and
because those images are already pre-loaded onto our CI clusters
to make the tests faster, use one of those images and always specify
PullIfNotPresent to avoid pulling the image again during the integration
test.
2021-08-25 15:12:07 -07:00
Ryan Richard 399737e7c6 Install docs use more GitOps-friendly style 2021-08-25 14:33:48 -07:00
Margo Crawford 1c5a2b8892 Add a couple more unit tests 2021-08-25 11:33:42 -07:00
Mo Khan c17e7bec49
Merge pull request #800 from enj/enj/i/leader_election_release
leader election: fix small race duration lease release
2021-08-25 10:29:19 -04:00
Monis Khan c71ffdcd1e
leader election: use better duration defaults
OpenShift has good defaults for these duration fields that we can
use instead of coming up with them ourselves:

e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)

Copied here for easy future reference:

// We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts.
// We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts.
// We want a single set of guidance for nearly every lease in openshift.  If you're special, we'll let you know.
// 1. clock skew tolerance is leaseDuration-renewDeadline == 30s
// 2. kube-apiserver downtime tolerance is == 78s
//      lastRetry=floor(renewDeadline/retryPeriod)*retryPeriod == 104
//      downtimeTolerance = lastRetry-retryPeriod == 78s
// 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s
// 4. worst graceful lease acquisition is retryPeriod == 26s
if ret.LeaseDuration.Duration == 0 {
	ret.LeaseDuration.Duration = 137 * time.Second
}

if ret.RenewDeadline.Duration == 0 {
	// this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew
	// if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew),
	// then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will
	// be 33 seconds of extra time before the lease is lost.
	ret.RenewDeadline.Duration = 107 * time.Second
}
if ret.RetryPeriod.Duration == 0 {
	ret.RetryPeriod.Duration = 26 * time.Second
}

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-24 16:21:53 -04:00