Commit Graph

372 Commits

Author SHA1 Message Date
Monis Khan 266d64f7d1
Do not truncate x509 errors
Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-29 09:38:22 -04:00
Monis Khan 0d6bf9db3e
kubecertagent: attempt to load signer as long as agent labels match
This change updates the kube cert agent to a middle ground behavior
that balances leader election gating with how quickly we load the
signer.

If the agent labels have not changed, we will attempt to load the
signer even if we cannot roll out the latest version of the kube
cert agent deployment.

This gives us the best behavior - we do not have controllers
fighting over the state of the deployment and we still get the
signer loaded quickly.

We will have a minute of downtime when the kube cert agent deployment
changes because the new pods will have to wait to become a leader
and for the new deployment to rollout the new pods.  We would need
to have a per pod deployment if we want to avoid that downtime (but
this would come at the cost of startup time and would require
coordination with the kubelet in regards to pod readiness).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-21 16:20:56 -04:00
Monis Khan 91c8f747f4
certauthority: tolerate larger clock skew between API server and pinniped
This change updates our certificate code to use the same 5 minute
backdate that is used by the Kubernetes controller manager.  This
helps to account for clock skews between the API servers and the
kubelets that are running the pinniped pods.  While this backdating
reflects a large percentage of the lifetime of our short lived
certificates (100% for the 5 minute client certificates), even a 10
minute irrevocable client certificate is within our limits.  When
we move to the CSR based short lived certificates, they will always
have at least a 15 minute lifetime (5 minute backdating plus 10 minute
minimum valid duration).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-21 09:32:24 -04:00
Monis Khan 09467d3e24
kubecertagent: fix flakey tests
This commit makes the following changes to the kube cert agent tests:

1. Informers are synced on start using the controllerinit code
2. Deployment client and informer are synced per controller sync loop
3. Controller sync loop exits after two consistent errors
4. Use assert instead of require to avoid ending the test early

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-16 14:48:04 -04:00
Ryan Richard bdcf468e52 Add log statement for when kube cert agent key has been loaded
Because it makes things easier to debug on a real cluster
2021-09-15 14:02:46 -07:00
Ryan Richard 55de160551 Bump the version number of the kube cert agent label
Not required, but within the spirit of using the version number.
Since the existing kube cert agent deployment will get deleted anyway
during an upgrade, it shouldn't hurt to change the version number.
New installations will get the new version number on the new kube cert
agent deployment.
2021-09-14 15:27:15 -07:00
Ryan Richard cec9f3c4d7 Improve the selectors of Deployments and Services
Fixes #801. The solution is complicated by the fact that the Selector
field of Deployments is immutable. It would have been easy to just
make the Selectors of the main Concierge Deployment, the Kube cert agent
Deployment, and the various Services use more specific labels, but
that would break upgrades. Instead, we make the Pod template labels and
the Service selectors more specific, because those not immutable, and
then handle the Deployment selectors in a special way.

For the main Concierge and Supervisor Deployments, we cannot change
their selectors, so they remain "app: app_name", and we make other
changes to ensure that only the intended pods are selected. We keep the
original "app" label on those pods and remove the "app" label from the
pods of the Kube cert agent Deployment. By removing it from the Kube
cert agent pods, there is no longer any chance that they will
accidentally get selected by the main Concierge Deployment.

For the Kube cert agent Deployment, we can change the immutable selector
by deleting and recreating the Deployment. The new selector uses only
the unique label that has always been applied to the pods of that
deployment. Upon recreation, these pods no longer have the "app" label,
so they will not be selected by the main Concierge Deployment's
selector.

The selector of all Services have been updated to use new labels to
more specifically target the intended pods. For the Concierge Services,
this will prevent them from accidentally including the Kube cert agent
pods. For the Supervisor Services, we follow the same convention just
to be consistent and to help future-proof the Supervisor app in case it
ever has a second Deployment added to it.

The selector of the auto-created impersonation proxy Service was
also previously using the "app" label. There is no change to this
Service because that label will now select the correct pods, since
the Kube cert agent pods no longer have that label. It would be possible
to update that selector to use the new more specific label, but then we
would need to invent a way to pass that label into the controller, so
it seemed like more work than was justified.
2021-09-14 13:35:10 -07:00
Margo Crawford 0a1ee9e37c Remove unused functions 2021-09-08 10:34:42 -07:00
Margo Crawford 05f5bac405 ValidatedSettings is all or nothing
If either the search base or the tls settings is invalid, just
recheck everything.
2021-09-07 13:09:35 -07:00
Margo Crawford 0195894a50 Test fix for ldap upstream watcher 2021-09-07 13:09:35 -07:00
Margo Crawford 27c1d2144a Make sure search base in the validatedSettings cache is properly updated when the bind secret changes 2021-09-07 13:09:35 -07:00
Margo Crawford 19100d68ef Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 20:42:16 -07:00
Mayank Bhatt 68547f767d Copy hostNetwork field for kube-cert-agent
For clusters where the control plane nodes aren't running a CNI, the
kube-cert-agent pods deployed by concierge cannot be scheduled as they
don't know to use `hostNetwork: true`. This change allows embedding the
host network setting in the Concierge configuration. (by copying it from
the kube-controller-manager pod spec when generating the kube-cert-agent
Deployment)

Also fixed a stray double comma in one of the nearby tests.
2021-08-26 17:09:59 -07:00
Margo Crawford 6f221678df Change sAMAccountName env vars to userPrincipalName
and add E2E ActiveDirectory test
also fixed regexes in supervisor_login_test to be anchored to the
beginning and end
2021-08-26 16:18:05 -07:00
Margo Crawford c590c8ff41 Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-24 12:19:29 -07:00
Mo Khan 3077034b2d
Merge branch 'main' into oidc_password_grant 2021-08-24 12:23:52 -04:00
Monis Khan c356710f1f
Add leader election middleware
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-20 12:18:25 -04:00
Margo Crawford 05afae60c2 Review comments--
- Change list of attributeParsingOverrides to a map
- Add unit test for sAMAccountName as group name without the override
- Change some comments in the the type definition.
2021-08-19 14:21:18 -07:00
Margo Crawford 1c5da35527 Merge remote-tracking branch 'origin' into active-directory-identity-provider 2021-08-18 12:44:12 -07:00
Margo Crawford 8657b0e3e7 Cleanup new group attribute behavior and add test coverage 2021-08-18 10:11:18 -07:00
Margo Crawford 26c47d564f Make new combined sAMAccountName@domain attribute the group name
Also change default username attribute to userPrincipalName
2021-08-17 16:53:26 -07:00
Ryan Richard 52409f86e8 Merge branch 'main' into oidc_password_grant 2021-08-16 15:17:55 -07:00
Monis Khan 7a812ac5ed
impersonatorconfig: only unload dynamiccert when proxy is disabled
In the upstream dynamiccertificates package, we rely on two pieces
of code:

1. DynamicServingCertificateController.newTLSContent which calls
   - clientCA.CurrentCABundleContent
   - servingCert.CurrentCertKeyContent
2. unionCAContent.VerifyOptions which calls
   - unionCAContent.CurrentCABundleContent

This results in calls to our tlsServingCertDynamicCertProvider and
impersonationSigningCertProvider.  If we Unset these providers, we
subtly break these consumers.  At best this results in test slowness
and flakes while we wait for reconcile loops to converge.  At worst,
it results in actual errors during runtime.  For example, we
previously would Unset the impersonationSigningCertProvider on any
sync loop error (even a transient one caused by a network blip or
a conflict between writes from different replicas of the concierge).
This would cause us to transiently fail to issue new certificates
from the token credential require API.  It would also cause us to
transiently fail to authenticate previously issued client certs
(which results in occasional Unauthorized errors in CI).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-16 16:07:46 -04:00
Ryan Richard 5b96d014b4 Merge branch 'main' into oidc_password_grant 2021-08-12 11:12:57 -07:00
Ryan Richard 84c3c3aa9c Optionally allow OIDC password grant for CLI-based login experience
- Add `AllowPasswordGrant` boolean field to OIDCIdentityProvider's spec
- The oidc upstream watcher controller copies the value of
  `AllowPasswordGrant` into the configuration of the cached provider
- Add password grant to the UpstreamOIDCIdentityProviderI interface
  which is implemented by the cached provider instance for use in the
  authorization endpoint
- Enhance the IDP discovery endpoint to return the supported "flows"
  for each IDP ("cli_password" and/or "browser_authcode")
- Enhance `pinniped get kubeconfig` to help the user choose the desired
  flow for the selected IDP, and to write the flow into the resulting
  kubeconfg
- Enhance `pinniped login oidc` to have a flow flag to tell it which
  client-side flow it should use for auth (CLI-based or browser-based)
- In the Dex config, allow the resource owner password grant, which Dex
  implements to also return ID tokens, for use in integration tests
- Enhance the authorize endpoint to perform password grant when
  requested by the incoming headers. This commit does not include unit
  tests for the enhancements to the authorize endpoint, which will come
  in the next commit
- Extract some shared helpers from the callback endpoint to share the
  code with the authorize endpoint
- Add new integration tests
2021-08-12 10:45:39 -07:00
Monis Khan a027f1ae2c
jwtcachefiller: update to use CAContentProvider
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-09 19:16:25 -04:00
Matt Moyer 58bbffded4
Switch to a slimmer distroless base image.
At a high level, it switches us to a distroless base container image, but that also includes several related bits:

- Add a writable /tmp but make the rest of our filesystems read-only at runtime.

- Condense our main server binaries into a single pinniped-server binary. This saves a bunch of space in
  the image due to duplicated library code. The correct behavior is dispatched based on `os.Args[0]`, and
  the `pinniped-server` binary is symlinked to `pinniped-concierge` and `pinniped-supervisor`.

- Strip debug symbols from our binaries. These aren't really useful in a distroless image anyway and all the
  normal stuff you'd expect to work, such as stack traces, still does.

- Add a separate `pinniped-concierge-kube-cert-agent` binary with "sleep" and "print" functionality instead of
  using builtin /bin/sleep and /bin/cat for the kube-cert-agent. This is split from the main server binary
  because the loading/init time of the main server binary was too large for the tiny resource footprint we
  established in our kube-cert-agent PodSpec. Using a separate binary eliminates this issue and the extra
  binary adds only around 1.5MiB of image size.

- Switch the kube-cert-agent code to use a JSON `{"tls.crt": "<b64 cert>", "tls.key": "<b64 key>"}` format.
  This is more robust to unexpected input formatting than the old code, which simply concatenated the files
  with some extra newlines and split on whitespace.

- Update integration tests that made now-invalid assumptions about the `pinniped-server` image.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-08-09 15:05:13 -04:00
Matt Moyer 5f679059d5
Add ClusterIP service to impersonator-config-controller informer.
Prior to this fix, this controller did not correctly react to changes to the ClusterIP service. It would still eventually react with a long delay due to our 5 minute resync interval.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-07-28 11:57:18 -05:00
Monis Khan 8b4ed86071
certs_expirer: be specific about what secret to delete
This change fixes a race that can occur because we have multiple
writers with no leader election lock.

1. TestAPIServingCertificateAutoCreationAndRotation/automatic
   expires the current serving certificate
2. CertsExpirerController 1 deletes expired serving certificate
3. CertsExpirerController 2 starts deletion of expired serving
   certificate but has not done so yet
4. CertsManagerController 1 creates new serving certificate
5. TestAPIServingCertificateAutoCreationAndRotation/automatic
   records the new serving certificate
6. CertsExpirerController 2 finishes deletion, and thus deletes the
   newly created serving certificate instead of the old one
7. CertsManagerController 2 creates new serving certificate
8. TestAPIServingCertificateAutoCreationAndRotation/automatic keeps
   running and eventually times out because it is expecting the
   serving certificate created by CertsManagerController 2 to match
   the value it recorded from CertsManagerController 1 (which will
   never happen since that certificate was incorrectly deleted).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-07-28 09:56:05 -04:00
Margo Crawford 474266f918 Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-07-27 15:06:58 -07:00
Margo Crawford bbaa820278 parsing objectGUID as human-readable string version 2021-07-27 11:08:23 -07:00
Margo Crawford 287a5d225a Change SearchBaseFound condition success reason to be a string constant 2021-07-27 10:23:05 -07:00
Ryan Richard f17f7c0c6a Small refactors in impersonator_config.go suggested by @mattmoyer 2021-07-26 17:46:06 -07:00
Margo Crawford cc3875f048 PR feedback 2021-07-26 16:03:12 -07:00
Margo Crawford 5d23068690 Removed a todo that was resolved 2021-07-23 13:01:41 -07:00
Margo Crawford 91085e68f9 Refactoring defaulting logic 2021-07-23 13:01:41 -07:00
Margo Crawford f99f7be836 Default values for ad usersearch and groupsearch 2021-07-23 13:01:41 -07:00
Margo Crawford 890d9c3216 resolve some todos about error handling search base discovery results 2021-07-23 13:01:41 -07:00
Margo Crawford cb0ee07b51 Fetch AD search base from defaultNamingContext when not specified 2021-07-23 13:01:41 -07:00
Margo Crawford 8e1d70562d Remove shared variables from ldap upstream observer 2021-07-23 13:01:41 -07:00
Margo Crawford 5d8d7246c2 Refactor active directory and ldap controllers to share almost everything
Signed-off-by: Ryan Richard <richardry@vmware.com>
2021-07-23 13:01:41 -07:00
Margo Crawford e5c8cbb3a4 One line fix for lint error. Forgot a period in a comment.
Signed-off-by: Ryan Richard <richardry@vmware.com>
2021-07-23 13:01:40 -07:00
Margo Crawford 7696f4256d Move defaulting of ad username and uid attributes to controller
Now the controller uses upstreamldap so there is less duplication,
since they are very similar.

Signed-off-by: Ryan Richard <richardry@vmware.com>
2021-07-23 13:01:40 -07:00
Ryan Richard aaa4861373 Custom API Group overlay for AD
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-07-23 13:01:40 -07:00
Margo Crawford be6f9f83ce RBAC rules for activedirectoryidentityprovider 2021-07-23 13:01:40 -07:00
Margo Crawford 3899292e89 Advertise Active Directory idps 2021-07-23 13:01:40 -07:00
Ryan Richard 708164b878 Carefully merge desired annotations into impersonation proxy Service
Don't overwrite annotations that might have come from a human user or
from some other non-Pinniped controller.
2021-07-22 17:09:50 -07:00
Ryan Richard 2bba39d723 TestAgentController unit test is flaky, try to add workaround
TestAgentController really runs the controller and evaluates multiple
calls to the controller's Sync with real informers caching updates.
There is a large amount of non-determinism in this unit test, and it
does not always behave the same way. Because it makes assertions about
the specific errors that should be returned by Sync, it was not
accounting for some errors that are only returned by Sync once in a
while depending on the exact (unpredictable) order of operations.

This commit doesn't fix the non-determinism in the test, but rather
tries to work around it by also allowing other (undesired but
inevitable) error messages to appear in the list of actual error
messages returned by the calls to the Sync function.

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-07-15 13:41:31 -07:00
Matt Moyer 9f91c6c884
Merge branch 'main' into oidc-upstream-watcher-supports-proxy 2021-07-09 07:24:52 -07:00
Ryan Richard e130da6daa Add unit test assertion for new OIDC client request timeout 2021-07-08 11:47:49 -07:00