Commit Graph

987 Commits

Author SHA1 Message Date
Monis Khan
eae55a8595
Fix typo in group removed warning
Signed-off-by: Monis Khan <mok@vmware.com>
2022-03-02 12:58:30 -05:00
Margo Crawford
609b55a6d7 Pinniped Supervisor should issue a warning when groups change during refresh 2022-03-01 14:01:57 -08:00
Margo Crawford
fdac4d16f0 Only run group refresh when the skipGroupRefresh boolean isn't set
for AD and LDAP
2022-02-17 12:50:28 -08:00
Margo Crawford
662f2cef9c Integration test for updating group search base
Also a small change to a comment
2022-02-17 11:29:59 -08:00
Margo Crawford
ca523b1f20 Always update groups even if it's nil
Also de-dup groups and various small formatting changes
2022-02-17 11:29:59 -08:00
Margo Crawford
c28602f275 Add unit tests for group parsing overrides 2022-02-17 11:29:59 -08:00
Margo Crawford
dd11c02b6a Add back entries because I think it's actually necessary 2022-02-17 11:29:59 -08:00
Margo Crawford
f890fad90c Rename a function, sort strings inside searchGroupsForUserDN 2022-02-17 11:29:59 -08:00
Margo Crawford
cd7538861a Add integration test where we don't get groups back 2022-02-17 11:29:59 -08:00
Margo Crawford
013b521838 Upstream ldap group refresh:
- Doing it inline on the refresh request
2022-02-17 11:29:59 -08:00
Ryan Richard
e5a60a8c84 Update a comment 2022-02-16 11:09:05 -08:00
Monis Khan
49e88dd74a
Change some single quotes to double quotes in minified JS
Signed-off-by: Monis Khan <mok@vmware.com>
2022-02-10 16:15:26 -05:00
Ryan Richard
5d79d4b9dc Fix form_post.js mistake from recent commit; Better CORS on callback 2022-02-08 17:30:48 -08:00
Ryan Richard
6781bfd7d8 Fix JS bug: form post UI shows manual copy/paste UI upon failed callback
When the POST to the CLI's localhost callback endpoint results in a
non-2XX status code, then treat that as a failed login attempt and
automatically show the manual copy/paste UI.
2022-02-07 16:21:23 -08:00
Margo Crawford
b30dad72ed Fix new refresh token grace period test to have warnings 2022-01-20 14:54:59 -08:00
Margo Crawford
31cdd808ac
Merge pull request #951 from vmware-tanzu/short-session-warning
Supervisor should emit a warning when access token lifetime is too short
2022-01-20 14:44:32 -08:00
Ryan Richard
e85a6c09f6
Merge pull request #953 from vmware-tanzu/dependabot/go_modules/github.com/tdewolff/minify/v2-2.9.29
Bump github.com/tdewolff/minify/v2 from 2.9.26 to 2.9.29
2022-01-20 14:16:05 -08:00
Margo Crawford
38d184fe81 Integration test + making sure we get the session correctly in token handler 2022-01-20 13:48:50 -08:00
Margo Crawford
b0ea7063c7 Supervisor should emit a warning when access token lifetime is too short 2022-01-20 13:48:50 -08:00
Ryan Richard
db789dc2bf
Merge branch 'main' into dependabot/go_modules/github.com/tdewolff/minify/v2-2.9.29 2022-01-20 12:10:24 -08:00
Ryan Richard
6ddc953989
Merge branch 'main' into dependabot/go_modules/github.com/ory/fosite-0.42.0 2022-01-20 12:10:01 -08:00
Ryan Richard
dff53b8144 Changes for Fosite's new RevokeRefreshTokenMaybeGracePeriod() interface
Fosite v0.42.0 introduced a new RevokeRefreshTokenMaybeGracePeriod()
interface function. Updated our code to support this change. We didn't
support grace periods on refresh tokens before, so implemented it by
making the new RevokeRefreshTokenMaybeGracePeriod() method just call
the old RevokeRefreshToken() method, therefore keeping our old behavior.
2022-01-19 13:57:01 -08:00
Ryan Richard
3b1cc30e8d Update unit test to match new JS minify output after minify upgrade 2022-01-19 13:29:07 -08:00
Ryan Richard
a4ca44ca14 Improve error handling when upstream groups is invalid during refresh 2022-01-19 12:57:47 -08:00
Ryan Richard
78bdb1928a
Merge branch 'main' into upstream-oidc-refresh-groups 2022-01-18 16:03:14 -08:00
Monis Khan
1e1789f6d1
Allow configuration of supervisor endpoints
This change allows configuration of the http and https listeners
used by the supervisor.

TCP (IPv4 and IPv6 with any interface and port) and Unix domain
socket based listeners are supported.  Listeners may also be
disabled.

Binding the http listener to TCP addresses other than 127.0.0.1 or
::1 is deprecated.

The deployment now uses https health checks.  The supervisor is
always able to complete a TLS connection with the use of a bootstrap
certificate that is signed by an in-memory certificate authority.

To support sidecar containers used by service meshes, Unix domain
socket based listeners include ACLs that allow writes to the socket
file from any runAsUser specified in the pod's containers.

Signed-off-by: Monis Khan <mok@vmware.com>
2022-01-18 17:43:45 -05:00
Ryan Richard
70bd831099
Merge branch 'main' into upstream-oidc-refresh-groups 2022-01-18 14:36:18 -08:00
Ryan Richard
88f3b29515 Merge branch 'main' into upstream-oidc-refresh-groups 2022-01-14 16:51:12 -08:00
Ryan Richard
75e4093067 Merge branch 'main' into ldap_and_activedirectory_status_conditions_bug 2022-01-14 16:50:34 -08:00
Ryan Richard
548977f579 Update group memberships during refresh for upstream OIDC providers
Update the user's group memberships when possible. Note that we won't
always have enough information to be able to update it (see code
comments).
2022-01-14 16:38:21 -08:00
Ryan Richard
7551af3eb8 Fix code that did not auto-merge correctly in previous merge from main 2022-01-14 10:59:39 -08:00
Ryan Richard
814399324f Merge branch 'main' into upstream_access_revocation_during_gc 2022-01-14 10:49:22 -08:00
Ryan Richard
db0a765b98 Merge branch 'main' into ldap_and_activedirectory_status_conditions_bug 2022-01-14 10:06:16 -08:00
Ryan Richard
092a80f849 Refactor some variable names and update one comment
Change variable names to match previously renamed interface name.
2022-01-14 10:06:00 -08:00
Margo Crawford
5b161be334 Refactored oidcUpstreamRefresh
Various style changes, updated some comments and variable names and
extracted a helper function for validation.
2022-01-12 18:05:22 -08:00
Margo Crawford
62be761ef1 Perform access token based refresh by fetching the userinfo 2022-01-12 18:05:10 -08:00
Ryan Richard
651d392b00 Refuse logins when no upstream refresh token and no userinfo endpoint
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-01-12 18:03:25 -08:00
Margo Crawford
6f3977de9d Store access token when refresh not available for authcode flow.
Also refactor oidc downstreamsessiondata code to be shared between
callback handler and auth handler.

Signed-off-by: Ryan Richard <richardry@vmware.com>
2022-01-12 18:03:25 -08:00
Ryan Richard
91924ec685 Revert adding allowAccessTokenBasedRefresh flag to OIDCIdentityProvider
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-01-12 18:03:25 -08:00
Margo Crawford
683a2c5b23 WIP adding access token to storage upon login 2022-01-12 18:03:25 -08:00
Ryan Richard
1f146f905a Add struct field for storing upstream access token in downstream session 2022-01-12 18:03:25 -08:00
Margo Crawford
2b744b2eef Add back comment about deferring validation when id token subject is missing 2022-01-12 11:19:43 -08:00
Margo Crawford
2958461970 Addressing PR feedback
store issuer and subject in storage for refresh
Clean up some constants

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-01-10 11:03:37 -08:00
Margo Crawford
f2d2144932 rename ValidateToken to ValidateTokenAndMergeWithUserInfo to better reflect what it's doing
Also changed a few comments and small things
2022-01-10 11:03:37 -08:00
Margo Crawford
c9cf13a01f Check for issuer if available
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-01-10 11:03:37 -08:00
Margo Crawford
0cd086cf9c Check username claim is unchanged for oidc.
Also add integration tests for claims changing.
2022-01-10 11:03:37 -08:00
Margo Crawford
b098435290 Refactor validatetoken to handle refresh case without id token
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-01-10 11:03:37 -08:00
Margo Crawford
74b007ff66 Validate that issuer url and urls returned from discovery are https
and that they have no query or fragment

Signed-off-by: Ryan Richard <richardry@vmware.com>
2022-01-10 11:03:37 -08:00
Margo Crawford
ed96b597c7 Check for subject matching with upstream refresh
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-01-10 11:03:37 -08:00
Ryan Richard
7f99d78462 Fix bug where LDAP or AD status conditions were not updated correctly
When the LDAP and AD IDP watcher controllers encountered an update error
while trying to update the status conditions of the IDP resources, then
they would drop the computed desired new value of the condition on the
ground. Next time the controller ran it would not try to update the
condition again because it wants to use the cached settings and had
already forgotten the desired new value of the condition computed during
the previous run of the controller. This would leave the outdated value
of the condition on the IDP resource.

This bug would manifest in CI as random failures in which the expected
condition message and the actual condition message would refer to
different versions numbers of the bind secret. The actual condition
message would refer to an older version of the bind secret because the
update failed and then the new desired message got dropped on the
ground.

This commit changes the in-memory caching strategy to also cache the
computed condition messages, allowing the conditions to be updated
on the IDP resource during future calls to Sync() in the case of a
failed update.
2022-01-07 17:19:13 -08:00
Monis Khan
c155c6e629
Clean up nits in AD code
- Make everything private
- Drop unused AuthTime field
- Use %q format string instead of "%s"
- Only rely on GetRawAttributeValues in AttributeUnchangedSinceLogin

Signed-off-by: Monis Khan <mok@vmware.com>
2021-12-17 08:53:44 -05:00
Monis Khan
a6085c9678
Drop unsafe unwrapper for exec.roundTripper
exec.roundTripper now implements utilnet.RoundTripperWrapper so this
unsafe hack is no longer needed.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-12-17 08:28:30 -05:00
Monis Khan
86f2bea8c5
pinniped CLI: allow all forms of http redirects
For password based login on the CLI (i.e. no browser), this change
relaxes the response code check to allow for any redirect code
handled by the Go standard library.  In the future, we can drop the
rewriteStatusSeeOtherToStatusFoundForBrowserless logic from the
server side code.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-12-17 08:28:29 -05:00
Monis Khan
9599ffcfb9
Update all deps to latest where possible, bump Kube deps to v0.23.1
Highlights from this dep bump:

1. Made a copy of the v0.4.0 github.com/go-logr/stdr implementation
   for use in tests.  We must bump this dep as Kube code uses a
   newer version now.  We would have to rewrite hundreds of test log
   assertions without this copy.
2. Use github.com/felixge/httpsnoop to undo the changes made by
   ory/fosite#636 for CLI based login flows.  This is required for
   backwards compatibility with older versions of our CLI.  A
   separate change after this will update the CLI to be more
   flexible (it is purposefully not part of this change to confirm
   that we did not break anything).  For all browser login flows, we
   now redirect using http.StatusSeeOther instead of http.StatusFound.
3. Drop plog.RemoveKlogGlobalFlags as klog no longer mutates global
   process flags
4. Only bump github.com/ory/x to v0.0.297 instead of the latest
   v0.0.321 because v0.0.298+ pulls in a newer version of
   go.opentelemetry.io/otel/semconv which breaks k8s.io/apiserver.
   We should update k8s.io/apiserver to use the newer code.
5. Migrate all code from k8s.io/apimachinery/pkg/util/clock to
   k8s.io/utils/clock and k8s.io/utils/clock/testing
6. Delete testutil.NewDeleteOptionsRecorder and migrate to the new
   kubetesting.NewDeleteActionWithOptions
7. Updated ExpectedAuthorizeCodeSessionJSONFromFuzzing caused by
   fosite's new rotated_secrets OAuth client field.  This new field
   is currently not relevant to us as we have no private clients.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-12-16 21:15:27 -05:00
Margo Crawford
59d999956c Move ad specific stuff to controller
also make extra refresh attributes a separate field rather than part of
Extra

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-12-09 16:16:36 -08:00
Margo Crawford
acaad05341 Make pwdLastSet stuff more generic and not require parsing the timestamp
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-12-09 16:16:36 -08:00
Margo Crawford
65f3464995 Fix issue with very high integer value parsing, add unit tests
also add comment about urgent replication
2021-12-09 16:16:36 -08:00
Margo Crawford
ee4f725209 Incorporate PR feedback 2021-12-09 16:16:36 -08:00
Margo Crawford
ef5a04c7ce Check for locked users on ad upstream refresh
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-12-09 16:16:36 -08:00
Margo Crawford
f62e9a2d33 Active directory checks for deactivated user
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-12-09 16:16:36 -08:00
Margo Crawford
da9b4620b3 Active Directory checks whether password has changed recently during
upstream refresh

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-12-09 16:16:35 -08:00
Margo Crawford
8db0203839 Add test for upstream ldap idp not found, wrong idp uid, and malformed
fosite session storage
2021-12-09 16:16:35 -08:00
Ryan Richard
f410d2bd00 Add revocation of upstream access tokens to garbage collector
Also refactor the code that decides which types of revocation failures
are worth retrying. Be more selective by only retrying those types of
errors that are likely to be worth retrying.
2021-12-08 14:29:25 -08:00
Ryan Richard
46008a7235 Add struct field for storing upstream access token in downstream session 2021-12-06 14:43:39 -08:00
Ryan Richard
b981055d31 Support revocation of access tokens in UpstreamOIDCIdentityProviderI
- Rename the RevokeRefreshToken() function to RevokeToken() and make it
  take the token type (refresh or access) as a new parameter.
- This is a prefactor getting ready to support revocation of upstream
  access tokens in the garbage collection handler.
2021-12-03 13:44:24 -08:00
Monis Khan
9d4a932656
phttp: add generic support for RFC 2616 14.46 warnings headers
Signed-off-by: Monis Khan <mok@vmware.com>
2021-11-30 15:11:59 -05:00
Mo Khan
78474cfae9
Merge branch 'main' into upstream_refresh_revocation_during_gc 2021-11-23 19:29:13 -05:00
Ryan Richard
e44540043d Attempt to fix a unit test that always failed on my laptop
Try to make the GCP plugin config less sensitive to the setup of the
computer on which it runs.
2021-11-23 15:47:19 -08:00
Ryan Richard
69be273e01
Merge branch 'main' into upstream_refresh_revocation_during_gc 2021-11-23 14:55:44 -08:00
Ryan Richard
91eed1ab24 Merge branch 'main' into upstream_refresh_revocation_during_gc 2021-11-23 12:11:39 -08:00
Ryan Richard
3ca8c49334 Improve garbage collector log format and some comments 2021-11-23 12:11:17 -08:00
Ryan Richard
b8a93b6b90 Merge branch 'main' into customize_ports 2021-11-18 09:31:18 -08:00
Ryan Richard
3b3641568a GC retries failed upstream revocations for a while, but not forever 2021-11-17 15:58:44 -08:00
Monis Khan
cd686ffdf3
Force the use of secure TLS config
This change updates the TLS config used by all pinniped components.
There are no configuration knobs associated with this change.  Thus
this change tightens our static defaults.

There are four TLS config levels:

1. Secure (TLS 1.3 only)
2. Default (TLS 1.2+ best ciphers that are well supported)
3. Default LDAP (TLS 1.2+ with less good ciphers)
4. Legacy (currently unused, TLS 1.2+ with all non-broken ciphers)

Highlights per component:

1. pinniped CLI
   - uses "secure" config against KAS
   - uses "default" for all other connections
2. concierge
   - uses "secure" config as an aggregated API server
   - uses "default" config as a impersonation proxy API server
   - uses "secure" config against KAS
   - uses "default" config for JWT authenticater (mostly, see code)
   - no changes to webhook authenticater (see code)
3. supervisor
   - uses "default" config as a server
   - uses "secure" config against KAS
   - uses "default" config against OIDC IDPs
   - uses "default LDAP" config against LDAP IDPs

Signed-off-by: Monis Khan <mok@vmware.com>
2021-11-17 16:55:35 -05:00
Ryan Richard
ca2cc40769 Add impersonationProxyServerPort to the Concierge's static ConfigMap
- Used to determine on which port the impersonation proxy will bind
- Defaults to 8444, which is the old hard-coded port value
- Allow the port number to be configured to any value within the
  range 1024 to 65535
- This commit does not include adding new config knobs to the ytt
  values file, so while it is possible to change this port without
  needing to recompile, it is not convenient
2021-11-17 13:27:59 -08:00
Ryan Richard
2383a88612 Add aggregatedAPIServerPort to the Concierge's static ConfigMap
- Allow the port number to be configured to any value within the
  range 1024 to 65535
- This commit does not include adding new config knobs to the ytt
  values file, so while it is possible to change this port without
  needing to recompile, it is not convenient
2021-11-16 16:43:51 -08:00
Ryan Richard
48518e9513 Add trace logging to help observe upstream OIDC refresh token revocation 2021-11-11 12:24:05 -08:00
Ryan Richard
de79f15068 Merge branch 'main' into upstream_refresh_revocation_during_gc 2021-11-10 15:35:42 -08:00
Ryan Richard
2388e25235 Revoke upstream OIDC refresh tokens during GC 2021-11-10 15:34:19 -08:00
Margo Crawford
cb60a44f8a extract ldap refresh search into helper function
also added an integration test for refresh failing after updating the username attribute
2021-11-05 14:22:43 -07:00
Margo Crawford
b5b8cab717 Refactors:
- pull construction of authenticators.Response into searchAndBindUser
- remove information about the identity provider in the error that gets
  returned to users. Put it in debug instead, where it may show up in
  logs.

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-11-05 14:22:43 -07:00
Margo Crawford
f988879b6e Addressing code review changes
- changed to use custom authenticators.Response rather than the k8s one
  that doesn't include space for a DN
- Added more checking for correct idp type in token handler
- small style changes

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-11-05 14:22:43 -07:00
Margo Crawford
84edfcb541 Refactor out a function, add tests for getting the wrong idp uid 2021-11-05 14:22:43 -07:00
Margo Crawford
8396937503 Updates to tests and some error assertions 2021-11-05 14:22:43 -07:00
Margo Crawford
2c4dc2951d resolved a couple of testing related todos 2021-11-05 14:22:43 -07:00
Margo Crawford
7a58086040 Check that username and subject remain the same for ldap refresh 2021-11-05 14:22:43 -07:00
Margo Crawford
19281313dd Basic upstream LDAP/AD refresh
This stores the user DN in the session data upon login and checks that
the entry still exists upon refresh. It doesn't check anything
else about the entry yet.
2021-11-05 14:22:42 -07:00
Ryan Richard
d0ced1fd74 WIP towards revoking upstream refresh tokens during GC
- Discover the revocation endpoint of the upstream provider in
  oidc_upstream_watcher.go and save it into the cache for future use
  by the garbage collector controller
- Adds RevokeRefreshToken to UpstreamOIDCIdentityProviderI
- Implements the production version of RevokeRefreshToken
- Implements test doubles for RevokeRefreshToken for future use in
  garbage collector's unit tests
- Prefactors the crud and session storage types for future use in the
  garbage collector controller
- See remaining TODOs in garbage_collector.go
2021-10-22 14:32:26 -07:00
Ryan Richard
e0db59fd09 More small updates based on PR feedback 2021-10-22 10:23:21 -07:00
Ryan Richard
dec43289f6 Lots of small updates based on PR feedback 2021-10-20 15:53:25 -07:00
Ryan Richard
c43e019d3a Change default of additionalScopes and disallow "hd" in additionalAuthorizeParameters 2021-10-18 16:41:31 -07:00
Ryan Richard
d68bebeb49 Merge branch 'main' into upstream_refresh 2021-10-18 15:35:46 -07:00
Ryan Richard
c51d7c08b9 Add a comment that might be useful some day 2021-10-18 15:35:22 -07:00
Ryan Richard
ddb23bd2ed Add upstream refresh related config to OIDCIdentityProvider CRD
Also update related docs.
2021-10-14 15:49:44 -07:00
Ryan Richard
a34dae549b When performing an upstream refresh, use the configured http client
Otherwise, the CA and proxy settings will not be used for the call
to the upstream token endpoint while performing the refresh. This
mistake was exposed by the TestSupervisorLogin integration test, so
it has test coverage.
2021-10-13 14:05:00 -07:00
Ryan Richard
79ca1d7fb0 Perform an upstream refresh during downstream refresh for OIDC upstreams
- If the upstream refresh fails, then fail the downstream refresh
- If the upstream refresh returns an ID token, then validate it (we
  use its claims in the future, but not in this commit)
- If the upstream refresh returns a new refresh token, then save it
  into the user's session in storage
- Pass the provider cache into the token handler so it can use the
  cached providers to perform upstream refreshes
- Handle unexpected errors in the token handler where the user's session
  does not contain the expected data. These should not be possible
  in practice unless someone is manually editing the storage, but
  handle them anyway just to be safe.
- Refactor to share the refresh code between the CLI and the token
  endpoint by moving it into the UpstreamOIDCIdentityProviderI
  interface, since the token endpoint needed it to be part of that
  interface anyway
2021-10-13 12:31:20 -07:00
Margo Crawford
1bd346cbeb Require refresh tokens for upstream OIDC and save more session data
- Requiring refresh tokens to be returned from upstream OIDC idps
- Storing refresh tokens (for oidc) and idp information (for all idps) in custom session data during authentication
- Don't pass access=offline all the time
2021-10-08 15:48:21 -07:00
Margo Crawford
43244b6599 Do not pass through downstream prompt param
- throw an error when prompt=none because the spec says we can't ignore
  it
- ignore the other prompt params

Signed-off-by: Ryan Richard <richardry@vmware.com>
2021-10-06 16:30:30 -07:00
Ryan Richard
c6f1d29538 Use PinnipedSession type instead of fosite's DefaultSesssion type
This will allow us to store custom data inside the fosite session
storage for all downstream OIDC sessions.

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-10-06 15:28:13 -07:00
Monis Khan
4bf715758f
Do not rotate impersonation proxy signer CA unless necessary
This change fixes a copy paste error that led to the impersonation
proxy signer CA being rotated based on the configuration of the
rotation of the aggregated API serving certificate.  This would lead
to occasional "Unauthorized" flakes in our CI environments that
rotate the serving certificate at a frequent interval.

Updated the certs_expirer controller logs to be more detailed.

Updated CA common names to be more specific (this does not update
any previously generated CAs).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-10-06 12:03:49 -04:00
Monis Khan
266d64f7d1
Do not truncate x509 errors
Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-29 09:38:22 -04:00
Monis Khan
03bbc54023
upstreamoidc: log claim keys at debug level
At debug level:

upstreamoidc.go:213] "claims from ID token and userinfo"
providerName="oidc"
keys=[at_hash aud email email_verified exp iat iss sub]

At all level:

upstreamoidc.go:207] "claims from ID token and userinfo"
providerName="oidc"
claims="{\"at_hash\":\"C55S-BgnHTmr2_TNf...hYmVhYWESBWxvY2Fs\"}"

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-28 12:58:00 -04:00
Monis Khan
e86488615a
upstreamoidc: directly detect user info support
Avoid reliance on an error string from the Core OS OIDC lib.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-28 11:29:38 -04:00
Monis Khan
0d6bf9db3e
kubecertagent: attempt to load signer as long as agent labels match
This change updates the kube cert agent to a middle ground behavior
that balances leader election gating with how quickly we load the
signer.

If the agent labels have not changed, we will attempt to load the
signer even if we cannot roll out the latest version of the kube
cert agent deployment.

This gives us the best behavior - we do not have controllers
fighting over the state of the deployment and we still get the
signer loaded quickly.

We will have a minute of downtime when the kube cert agent deployment
changes because the new pods will have to wait to become a leader
and for the new deployment to rollout the new pods.  We would need
to have a per pod deployment if we want to avoid that downtime (but
this would come at the cost of startup time and would require
coordination with the kubelet in regards to pod readiness).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-21 16:20:56 -04:00
Mo Khan
9851035e40
Merge pull request #847 from enj/enj/i/tcr_log
token credential request: fix trace log kind
2021-09-21 12:36:16 -04:00
Mo Khan
aa5ff162b4
Merge pull request #849 from enj/enj/i/clock_skew
certauthority: tolerate larger clock skew between API server and pinniped
2021-09-21 12:18:49 -04:00
Monis Khan
91c8f747f4
certauthority: tolerate larger clock skew between API server and pinniped
This change updates our certificate code to use the same 5 minute
backdate that is used by the Kubernetes controller manager.  This
helps to account for clock skews between the API servers and the
kubelets that are running the pinniped pods.  While this backdating
reflects a large percentage of the lifetime of our short lived
certificates (100% for the 5 minute client certificates), even a 10
minute irrevocable client certificate is within our limits.  When
we move to the CSR based short lived certificates, they will always
have at least a 15 minute lifetime (5 minute backdating plus 10 minute
minimum valid duration).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-21 09:32:24 -04:00
Ryan Richard
4e98c1bbdb Tests use CertificatesV1 when available, otherwise use CertificatesV1beta1
CertificatesV1beta1 was removed in Kube 1.22, so the tests cannot
blindly rely on it anymore. Use CertificatesV1 whenever the server
reports that is available, and otherwise use the old
CertificatesV1beta1.

Note that CertificatesV1 was introduced in Kube 1.19.
2021-09-20 17:14:58 -07:00
Monis Khan
e65817ad5b
token credential request: fix trace log kind
Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-20 15:34:05 -04:00
Monis Khan
09467d3e24
kubecertagent: fix flakey tests
This commit makes the following changes to the kube cert agent tests:

1. Informers are synced on start using the controllerinit code
2. Deployment client and informer are synced per controller sync loop
3. Controller sync loop exits after two consistent errors
4. Use assert instead of require to avoid ending the test early

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-16 14:48:04 -04:00
Ryan Richard
bdcf468e52 Add log statement for when kube cert agent key has been loaded
Because it makes things easier to debug on a real cluster
2021-09-15 14:02:46 -07:00
Ryan Richard
55de160551 Bump the version number of the kube cert agent label
Not required, but within the spirit of using the version number.
Since the existing kube cert agent deployment will get deleted anyway
during an upgrade, it shouldn't hurt to change the version number.
New installations will get the new version number on the new kube cert
agent deployment.
2021-09-14 15:27:15 -07:00
Ryan Richard
cec9f3c4d7 Improve the selectors of Deployments and Services
Fixes #801. The solution is complicated by the fact that the Selector
field of Deployments is immutable. It would have been easy to just
make the Selectors of the main Concierge Deployment, the Kube cert agent
Deployment, and the various Services use more specific labels, but
that would break upgrades. Instead, we make the Pod template labels and
the Service selectors more specific, because those not immutable, and
then handle the Deployment selectors in a special way.

For the main Concierge and Supervisor Deployments, we cannot change
their selectors, so they remain "app: app_name", and we make other
changes to ensure that only the intended pods are selected. We keep the
original "app" label on those pods and remove the "app" label from the
pods of the Kube cert agent Deployment. By removing it from the Kube
cert agent pods, there is no longer any chance that they will
accidentally get selected by the main Concierge Deployment.

For the Kube cert agent Deployment, we can change the immutable selector
by deleting and recreating the Deployment. The new selector uses only
the unique label that has always been applied to the pods of that
deployment. Upon recreation, these pods no longer have the "app" label,
so they will not be selected by the main Concierge Deployment's
selector.

The selector of all Services have been updated to use new labels to
more specifically target the intended pods. For the Concierge Services,
this will prevent them from accidentally including the Kube cert agent
pods. For the Supervisor Services, we follow the same convention just
to be consistent and to help future-proof the Supervisor app in case it
ever has a second Deployment added to it.

The selector of the auto-created impersonation proxy Service was
also previously using the "app" label. There is no change to this
Service because that label will now select the correct pods, since
the Kube cert agent pods no longer have that label. It would be possible
to update that selector to use the new more specific label, but then we
would need to invent a way to pass that label into the controller, so
it seemed like more work than was justified.
2021-09-14 13:35:10 -07:00
Margo Crawford
0a1ee9e37c Remove unused functions 2021-09-08 10:34:42 -07:00
Margo Crawford
05f5bac405 ValidatedSettings is all or nothing
If either the search base or the tls settings is invalid, just
recheck everything.
2021-09-07 13:09:35 -07:00
Margo Crawford
0195894a50 Test fix for ldap upstream watcher 2021-09-07 13:09:35 -07:00
Margo Crawford
27c1d2144a Make sure search base in the validatedSettings cache is properly updated when the bind secret changes 2021-09-07 13:09:35 -07:00
Monis Khan
0d285ce993
Ensure concierge and supervisor gracefully exit
Changes made to both components:

1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever

Changes made to concierge:

1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook

Changes made to supervisor:

1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle

Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit.  This reduces
the time needed for the next leader to be elected.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-30 20:29:52 -04:00
Monis Khan
5489f68e2f
supervisor: ensure graceful exit
The kubelet will send the SIGTERM signal when it wants a process to
exit.  After a grace period, it will send the SIGKILL signal to
force the process to terminate.  The concierge has always handled
both SIGINT and SIGTERM as indicators for it to gracefully exit
(i.e. stop watches, controllers, etc).  This change updates the
supervisor to do the same (previously it only handled SIGINT).  This
is required to allow the leader election lock release logic to run.
Otherwise it can take a few minutes for new pods to acquire the
lease since they believe it is already held.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-28 11:23:11 -04:00
Margo Crawford
e5718351ba
Merge pull request #695 from vmware-tanzu/active-directory-identity-provider
Active directory identity provider
2021-08-27 08:39:12 -07:00
Monis Khan
6c29f347b4
go 1.17 bump: fix unit test failures
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-27 09:46:58 -04:00
Margo Crawford
19100d68ef Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 20:42:16 -07:00
Mayank Bhatt
68547f767d Copy hostNetwork field for kube-cert-agent
For clusters where the control plane nodes aren't running a CNI, the
kube-cert-agent pods deployed by concierge cannot be scheduled as they
don't know to use `hostNetwork: true`. This change allows embedding the
host network setting in the Concierge configuration. (by copying it from
the kube-controller-manager pod spec when generating the kube-cert-agent
Deployment)

Also fixed a stray double comma in one of the nearby tests.
2021-08-26 17:09:59 -07:00
Margo Crawford
2d32e0fa7d Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-26 16:21:08 -07:00
Margo Crawford
6f221678df Change sAMAccountName env vars to userPrincipalName
and add E2E ActiveDirectory test
also fixed regexes in supervisor_login_test to be anchored to the
beginning and end
2021-08-26 16:18:05 -07:00
Monis Khan
74daa1da64
test/integration: run parallel tests concurrently with serial tests
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-26 12:59:52 -04:00
Margo Crawford
1c5a2b8892 Add a couple more unit tests 2021-08-25 11:33:42 -07:00
Monis Khan
c71ffdcd1e
leader election: use better duration defaults
OpenShift has good defaults for these duration fields that we can
use instead of coming up with them ourselves:

e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)

Copied here for easy future reference:

// We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts.
// We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts.
// We want a single set of guidance for nearly every lease in openshift.  If you're special, we'll let you know.
// 1. clock skew tolerance is leaseDuration-renewDeadline == 30s
// 2. kube-apiserver downtime tolerance is == 78s
//      lastRetry=floor(renewDeadline/retryPeriod)*retryPeriod == 104
//      downtimeTolerance = lastRetry-retryPeriod == 78s
// 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s
// 4. worst graceful lease acquisition is retryPeriod == 26s
if ret.LeaseDuration.Duration == 0 {
	ret.LeaseDuration.Duration = 137 * time.Second
}

if ret.RenewDeadline.Duration == 0 {
	// this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew
	// if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew),
	// then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will
	// be 33 seconds of extra time before the lease is lost.
	ret.RenewDeadline.Duration = 107 * time.Second
}
if ret.RetryPeriod.Duration == 0 {
	ret.RetryPeriod.Duration = 26 * time.Second
}

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-24 16:21:53 -04:00
Margo Crawford
c590c8ff41 Merge branch 'main' of github.com:vmware-tanzu/pinniped into active-directory-identity-provider 2021-08-24 12:19:29 -07:00
Monis Khan
c0617ceda4
leader election: in-memory leader status is stopped before release
This change fixes a small race condition that occurred when the
current leader failed to renew its lease.  Before this change, the
leader would first release the lease via the Kube API and then would
update its in-memory status to reflect that change.  Now those
events occur in the reverse (i.e. correct) order.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-24 15:02:56 -04:00
Mo Khan
3077034b2d
Merge branch 'main' into oidc_password_grant 2021-08-24 12:23:52 -04:00
Ryan Richard
211f4b23d1 Log auth endpoint errors with stack traces 2021-08-20 14:41:02 -07:00
Monis Khan
c356710f1f
Add leader election middleware
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-20 12:18:25 -04:00
Margo Crawford
05afae60c2 Review comments--
- Change list of attributeParsingOverrides to a map
- Add unit test for sAMAccountName as group name without the override
- Change some comments in the the type definition.
2021-08-19 14:21:18 -07:00
Ryan Richard
6239a567a8 remove one nolint:unparam comment 2021-08-19 10:57:00 -07:00
Ryan Richard
e4d418a076 Merge branch 'main' into oidc_password_grant 2021-08-19 10:55:54 -07:00
Ryan Richard
b4a39ba3c4 Remove unparam linter
We decided that this linter does not provide very useful feedback
for our project.
2021-08-19 10:20:24 -07:00
Margo Crawford
1c5da35527 Merge remote-tracking branch 'origin' into active-directory-identity-provider 2021-08-18 12:44:12 -07:00
Ryan Richard
61c21d2977 Refactor some authorize and callback error handling, and add more tests 2021-08-18 12:06:46 -07:00
Ryan Richard
04b8f0b455 Extract Supervisor authorize endpoint string constants into apis pkg 2021-08-18 10:20:33 -07:00
Margo Crawford
8657b0e3e7 Cleanup new group attribute behavior and add test coverage 2021-08-18 10:11:18 -07:00
Ryan Richard
0089540b07 Extract Supervisor IDP discovery endpoint string constants into apis pkg 2021-08-17 17:50:02 -07:00
Margo Crawford
26c47d564f Make new combined sAMAccountName@domain attribute the group name
Also change default username attribute to userPrincipalName
2021-08-17 16:53:26 -07:00
Ryan Richard
62c6d53a21 Merge branch 'main' into oidc_password_grant 2021-08-17 15:23:29 -07:00
Ryan Richard
96474b3d99 Extract Supervisor IDP discovery endpoint types into apis package 2021-08-17 15:23:03 -07:00
Ryan Richard
964d16110e Some refactors based on PR feedback from @enj 2021-08-17 13:14:09 -07:00
Monis Khan
e0901f4fe5
dynamiccert: prevent misuse of NewServingCert
The Kube API server code that we use will cast inputs in an attempt
to see if they implement optional interfaces.  This change adds a
simple wrapper struct to prevent such casts from causing us any
issues.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-17 12:58:32 -04:00
Ryan Richard
52409f86e8 Merge branch 'main' into oidc_password_grant 2021-08-16 15:17:55 -07:00
Ryan Richard
91c8a3ebed Extract private helper in auth_handler.go 2021-08-16 15:17:30 -07:00
Ryan Richard
52cb0bbc07 More unit tests and small error handling changes for OIDC password grant 2021-08-16 14:27:40 -07:00