Commit Graph

582 Commits

Author SHA1 Message Date
Monis Khan 1e17418585
TestSupervisorUpstreamOIDCDiscovery: include AdditionalAuthorizeParametersValid condition
Signed-off-by: Monis Khan <mok@vmware.com>
2021-10-25 10:21:51 -04:00
Ryan Richard 7ec0304472 Add offline_access scope for integration tests when using Dex 2021-10-19 12:25:51 -07:00
Ryan Richard 9e05d175a7 Add integration test: upstream refresh failure during downstream refresh 2021-10-13 15:12:19 -07:00
Monis Khan 266d64f7d1
Do not truncate x509 errors
Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-29 09:38:22 -04:00
Ryan Richard ddf5e566b0 Update a comment 2021-09-21 14:07:08 -07:00
Ryan Richard f700246bfa Allow focused integration tests to be run from the GoLand UI again
This was broken recently by the improvements in #808.
2021-09-21 12:04:45 -07:00
Ryan Richard fca183b203 Show DefaultStrategy as a new printer column for CredentialIssuer 2021-09-21 12:01:30 -07:00
Ryan Richard 1b2a116518 Merge branch 'main' into crd_printcolumns 2021-09-21 09:36:46 -07:00
Ryan Richard 4e98c1bbdb Tests use CertificatesV1 when available, otherwise use CertificatesV1beta1
CertificatesV1beta1 was removed in Kube 1.22, so the tests cannot
blindly rely on it anymore. Use CertificatesV1 whenever the server
reports that is available, and otherwise use the old
CertificatesV1beta1.

Note that CertificatesV1 was introduced in Kube 1.19.
2021-09-20 17:14:58 -07:00
Ryan Richard 0a31f45812 Update the AdditionalPrinterColumns of the CRDs, and add a test for it 2021-09-20 12:47:39 -07:00
Ryan Richard 04544b3d3c Update TestKubeCertAgent to use new "v3" label value 2021-09-15 11:09:07 -07:00
Margo Crawford 05f5bac405 ValidatedSettings is all or nothing
If either the search base or the tls settings is invalid, just
recheck everything.
2021-09-07 13:09:35 -07:00
Margo Crawford 27c1d2144a Make sure search base in the validatedSettings cache is properly updated when the bind secret changes 2021-09-07 13:09:35 -07:00
Monis Khan 0d285ce993
Ensure concierge and supervisor gracefully exit
Changes made to both components:

1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever

Changes made to concierge:

1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook

Changes made to supervisor:

1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle

Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit.  This reduces
the time needed for the next leader to be elected.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-30 20:29:52 -04:00
Monis Khan ba80b691e1
test/integration: use short timeouts with distinct requests to prevent hangs
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 16:10:36 -04:00
Monis Khan 5078cdbc90
test/integration: increase timeout on disruptive tests
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 14:56:51 -04:00
Margo Crawford 43694777d5 Change some comments on API docs, fix lint error by ignoring it 2021-08-26 16:55:43 -07:00
Margo Crawford 2d32e0fa7d Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 16:21:08 -07:00
Margo Crawford 6f221678df Change sAMAccountName env vars to userPrincipalName
and add E2E ActiveDirectory test
also fixed regexes in supervisor_login_test to be anchored to the
beginning and end
2021-08-26 16:18:05 -07:00
Monis Khan d4a7f0b3e1
test/integration: mark certain tests as disruptive
This prevents them from running with any other test, including other
parallel tests.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 15:11:47 -04:00
Monis Khan e2cf9f6b74
leader election test: approximate that followers have observed change
Instead of blindly waiting long enough for a disruptive change to
have been observed by the old leader and followers, we instead rely
on the approximation that checkOnlyLeaderCanWrite provides - i.e.
only a single actor believes they are the leader.  This does not
account for clients that were in the followers list before and after
the disruptive change, but it serves as a reasonable approximation.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 12:59:52 -04:00
Monis Khan 74daa1da64
test/integration: run parallel tests concurrently with serial tests
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 12:59:52 -04:00
Ryan Richard d20cab10b9 Replace one-off usages of busybox and debian images in integration tests
Those images that are pulled from Dockerhub will cause pull failures
on some test clusters due to Dockerhub rate limiting.

Because we already have some images that we use for testing, and
because those images are already pre-loaded onto our CI clusters
to make the tests faster, use one of those images and always specify
PullIfNotPresent to avoid pulling the image again during the integration
test.
2021-08-25 15:12:07 -07:00
Monis Khan c71ffdcd1e
leader election: use better duration defaults
OpenShift has good defaults for these duration fields that we can
use instead of coming up with them ourselves:

e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)

Copied here for easy future reference:

// We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts.
// We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts.
// We want a single set of guidance for nearly every lease in openshift.  If you're special, we'll let you know.
// 1. clock skew tolerance is leaseDuration-renewDeadline == 30s
// 2. kube-apiserver downtime tolerance is == 78s
//      lastRetry=floor(renewDeadline/retryPeriod)*retryPeriod == 104
//      downtimeTolerance = lastRetry-retryPeriod == 78s
// 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s
// 4. worst graceful lease acquisition is retryPeriod == 26s
if ret.LeaseDuration.Duration == 0 {
	ret.LeaseDuration.Duration = 137 * time.Second
}

if ret.RenewDeadline.Duration == 0 {
	// this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew
	// if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew),
	// then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will
	// be 33 seconds of extra time before the lease is lost.
	ret.RenewDeadline.Duration = 107 * time.Second
}
if ret.RetryPeriod.Duration == 0 {
	ret.RetryPeriod.Duration = 26 * time.Second
}

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-24 16:21:53 -04:00
Margo Crawford c590c8ff41 Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-24 12:19:29 -07:00
Monis Khan c0617ceda4
leader election: in-memory leader status is stopped before release
This change fixes a small race condition that occurred when the
current leader failed to renew its lease.  Before this change, the
leader would first release the lease via the Kube API and then would
update its in-memory status to reflect that change.  Now those
events occur in the reverse (i.e. correct) order.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-24 15:02:56 -04:00
Mo Khan 3077034b2d
Merge branch 'main' into oidc_password_grant 2021-08-24 12:23:52 -04:00
Monis Khan 132ec0d2ad
leader election test: fix flake related to invalid assumption
Even though a client may hold the leader election lock in the Kube
lease API, that does not mean it has had a chance to update its
internal state to reflect that.  Thus we retry the checks in
checkOnlyLeaderCanWrite a few times to allow the client to catch up.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-20 17:04:26 -04:00
Monis Khan c356710f1f
Add leader election middleware
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-20 12:18:25 -04:00
Margo Crawford 5e9087263d Increase timeout for activedirectoryidentityprovider to be loaded 2021-08-18 16:24:05 -07:00
Margo Crawford a20aee5f18 Update test assertions to reflect userPrincipalName as username 2021-08-18 13:18:53 -07:00
Margo Crawford 1c5da35527 Merge remote-tracking branch 'origin' into active-directory-identity-provider 2021-08-18 12:44:12 -07:00
Margo Crawford 26c47d564f Make new combined sAMAccountName@domain attribute the group name
Also change default username attribute to userPrincipalName
2021-08-17 16:53:26 -07:00
Ryan Richard 62c6d53a21 Merge branch 'main' into oidc_password_grant 2021-08-17 15:23:29 -07:00
Monis Khan cf25c308cd
test/integration: ignore restarts associated with test pods
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-17 12:57:41 -04:00
Ryan Richard 3fb683f64e Update expected error message in e2e integration test 2021-08-16 15:40:34 -07:00
Ryan Richard 52409f86e8 Merge branch 'main' into oidc_password_grant 2021-08-16 15:17:55 -07:00
Monis Khan 7a812ac5ed
impersonatorconfig: only unload dynamiccert when proxy is disabled
In the upstream dynamiccertificates package, we rely on two pieces
of code:

1. DynamicServingCertificateController.newTLSContent which calls
   - clientCA.CurrentCABundleContent
   - servingCert.CurrentCertKeyContent
2. unionCAContent.VerifyOptions which calls
   - unionCAContent.CurrentCABundleContent

This results in calls to our tlsServingCertDynamicCertProvider and
impersonationSigningCertProvider.  If we Unset these providers, we
subtly break these consumers.  At best this results in test slowness
and flakes while we wait for reconcile loops to converge.  At worst,
it results in actual errors during runtime.  For example, we
previously would Unset the impersonationSigningCertProvider on any
sync loop error (even a transient one caused by a network blip or
a conflict between writes from different replicas of the concierge).
This would cause us to transiently fail to issue new certificates
from the token credential require API.  It would also cause us to
transiently fail to authenticate previously issued client certs
(which results in occasional Unauthorized errors in CI).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-16 16:07:46 -04:00
Ryan Richard 5b96d014b4 Merge branch 'main' into oidc_password_grant 2021-08-12 11:12:57 -07:00
Ryan Richard 84c3c3aa9c Optionally allow OIDC password grant for CLI-based login experience
- Add `AllowPasswordGrant` boolean field to OIDCIdentityProvider's spec
- The oidc upstream watcher controller copies the value of
  `AllowPasswordGrant` into the configuration of the cached provider
- Add password grant to the UpstreamOIDCIdentityProviderI interface
  which is implemented by the cached provider instance for use in the
  authorization endpoint
- Enhance the IDP discovery endpoint to return the supported "flows"
  for each IDP ("cli_password" and/or "browser_authcode")
- Enhance `pinniped get kubeconfig` to help the user choose the desired
  flow for the selected IDP, and to write the flow into the resulting
  kubeconfg
- Enhance `pinniped login oidc` to have a flow flag to tell it which
  client-side flow it should use for auth (CLI-based or browser-based)
- In the Dex config, allow the resource owner password grant, which Dex
  implements to also return ID tokens, for use in integration tests
- Enhance the authorize endpoint to perform password grant when
  requested by the incoming headers. This commit does not include unit
  tests for the enhancements to the authorize endpoint, which will come
  in the next commit
- Extract some shared helpers from the callback endpoint to share the
  code with the authorize endpoint
- Add new integration tests
2021-08-12 10:45:39 -07:00
Monis Khan 34fd0ea2e2
impersonation proxy: assert nested UID impersonation is disallowed
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-10 00:03:33 -04:00
Monis Khan 724acdca1d
Update tests for new CSR duration code
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-09 19:16:50 -04:00
Matt Moyer 58bbffded4
Switch to a slimmer distroless base image.
At a high level, it switches us to a distroless base container image, but that also includes several related bits:

- Add a writable /tmp but make the rest of our filesystems read-only at runtime.

- Condense our main server binaries into a single pinniped-server binary. This saves a bunch of space in
  the image due to duplicated library code. The correct behavior is dispatched based on `os.Args[0]`, and
  the `pinniped-server` binary is symlinked to `pinniped-concierge` and `pinniped-supervisor`.

- Strip debug symbols from our binaries. These aren't really useful in a distroless image anyway and all the
  normal stuff you'd expect to work, such as stack traces, still does.

- Add a separate `pinniped-concierge-kube-cert-agent` binary with "sleep" and "print" functionality instead of
  using builtin /bin/sleep and /bin/cat for the kube-cert-agent. This is split from the main server binary
  because the loading/init time of the main server binary was too large for the tiny resource footprint we
  established in our kube-cert-agent PodSpec. Using a separate binary eliminates this issue and the extra
  binary adds only around 1.5MiB of image size.

- Switch the kube-cert-agent code to use a JSON `{"tls.crt": "<b64 cert>", "tls.key": "<b64 key>"}` format.
  This is more robust to unexpected input formatting than the old code, which simply concatenated the files
  with some extra newlines and split on whitespace.

- Update integration tests that made now-invalid assumptions about the `pinniped-server` image.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-08-09 15:05:13 -04:00
Monis Khan ac7d65c4a8
concierge_impersonation_proxy_test: run slowly for EKS
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-01 18:19:53 -04:00
Matt Moyer 1e32530d7b
Fix broken TTY after manual auth code prompt.
This may be a temporary fix. It switches the manual auth code prompt to use `promptForValue()` instead of `promptForSecret()`. The `promptForSecret()` function no longer supports cancellation (the v0.9.2 behavior) and the method of cancelling in `promptForValue()` is now based on running the blocking read in a background goroutine, which is allowed to block forever or leak (which is not important for our CLI use case).

This means that the authorization code is now visible in the user's terminal, but this is really not a big deal because of PKCE and the limited lifetime of an auth code.

The main goroutine now correctly waits for the "manual prompt" goroutine to clean up, which now includes printing the extra newline that would normally have been entered by the user in the manual flow.

The text of the manual login prompt is updated to be more concise and less scary (don't use the word "fail").

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-07-30 12:45:44 -05:00
Monis Khan 22be97eeda
concierge_impersonation_proxy_test: check all forms of DNS
Signed-off-by: Monis Khan <mok@vmware.com>
2021-07-29 13:35:37 -04:00
Ryan Richard d73093a694 Avoid failures due to impersonation Service having unrelated annotations 2021-07-28 14:19:14 -07:00
Matt Moyer b42b1c1110
Relax the timeout for TestLegacyPodCleaner a bit.
This test is asynchronously waiting for the controller to do something, and in some of our test environments it will take a bit longer than we'd previously allowed.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-07-28 13:08:57 -05:00
Matt Moyer 48c8fabb5c
Fix backwards condition in E2E test assertion.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-07-28 12:40:07 -05:00
Margo Crawford 474266f918 Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-07-27 15:06:58 -07:00