After noticing that the upstream OIDC discovery calls can hang
indefinitely, I had tried to impose a one minute timeout on them
by giving them a timeout context. However, I hadn't noticed that the
context also gets passed into the JWKS fetching object, which gets
added to our cache and used later. Therefore the timeout context
was added to the cache and timed out while sitting in the cache,
causing later JWKS fetchers to fail.
This commit is trying again to impose a reasonable timeout on these
discovery and JWKS calls, but this time by using http.Client's Timeout
field, which is documented to be a timeout for *each* request/response
cycle, so hopefully this is a more appropriate way to impose a timeout
for this use case. The http.Client instance ends up in the cache on
the JWKS fetcher object, so the timeout should apply to each JWKS
request as well.
Requests that can hang forever are effectively a server-side resource
leak, which could theoretically be taken advantage of in a denial of
service attempt, so it would be nice to avoid having them.
- Add new optional ytt params for the Supervisor deployment.
- When the Supervisor is making calls to an upstream OIDC provider,
use these variables if they were provided.
- These settings are integration tested in the main CI pipeline by
sometimes setting them on deployments in certain cases, and then
letting the existing integration tests (e.g. TestE2EFullIntegration)
provide the coverage, so there are no explicit changes to the
integration tests themselves in this commit.
- this allows the oidc upsream watcher to honor the
HTTP_PROXY,HTTPS_PROXY,NO_PROXY environment variables
Co-authored-by: Christian Ang <angc@vmware.com>
- Automatically try to fall back to using StartTLS when using TLS
doesn't work. Only complain when both don't work.
- Remember (in-memory) which one worked and keeping using that one
in the future (unless the pod restarts).
- This enhances our LDAP client code to make it possible to optionally
dial an LDAP server without TLS and then use StartTLS to upgrade
the connection to TLS.
- The controller for LDAPIdentityProviders is not using this option
yet. That will come in a future commit.
Reflect the upstream group membership into the Supervisor's
downstream tokens, so they can be added to the user's
identity on the workload clusters.
LDAP group search is configurable on the
LDAPIdentityProvider resource.
This change makes it easier to understand misconfigurations caused
by issuers with extraneous trailing slashes.
Signed-off-by: Mo Khan <mok@vmware.com>
Also force the LDAP server pod to restart whenever the LDIF file
changes, so whenever you redeploy the tools deployment with a new test
user password the server will be updated.
Unfortunately, Secrets do not seem to have a Generation field, so we
use the ResourceVersion field instead. This means that any change to
the Secret will cause us to retry the connection to the LDAP server,
even if the username and password fields in the Secret were not
changed. Seems like an okay trade-off for this early draft of the
controller compared to a more complex implementation.
This early version of the controller is not intended to act as an
ongoing health check for your upstream LDAP server. It will connect
to the LDAP server to essentially "lint" your configuration once.
It will do it again only when you change your configuration. To account
for transient errors, it will keep trying to connect to the server
until it succeeds once.
This commit does not include looking for changes in the associated bind
user username/password Secret.
- The ldap_upstream_watcher.go controller validates the bind secret and
uses the Conditions to report errors. Shares some condition reporting
logic with its sibling controller oidc_upstream_watcher.go, to the
extent which is convenient without generics in golang.
- When the upstream IDP is an LDAP IDP and the user's LDAP username and
password are received as new custom headers, then authenticate the
user and, if authentication was successful, return a redirect with
an authcode. Handle errors according to the OAuth/OIDC specs.
- Still does not support having multiple upstream IDPs defined at the
same time, which was an existing limitation of this endpoint.
- Does not yet include the actual LDAP authentication, which is
hidden behind an interface from the point of view of auth_handler.go
- Move the oidctestutil package to the testutil directory.
- Add an interface for Fosite storage to avoid a cyclical test
dependency.
- Add GetURL() to the UpstreamLDAPIdentityProviderI interface.
- Extract test helpers to be shared between callback_handler_test.go
and auth_handler_test.go because the authcode and fosite storage
assertions should be identical.
- Backfill Content-Type assertions in callback_handler_test.go.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
All controller unit tests were accidentally using a timeout context
for the informers, instead of a cancel context which stays alive until
each test is completely finished. There is no reason to risk
unpredictable behavior of a timeout being reached during an individual
test, even though with the previous 3 second timeout it could only be
reached on a machine which is running orders of magnitude slower than
usual, since each test usually runs in about 100-300 ms. Unfortunately,
sometimes our CI workers might get that slow.
This sparked a review of other usages of timeout contexts in other
tests, and all of them were increased to a minimum value of 1 minute,
under the rule of thumb that our tests will be more reliable on slow
machines if they "pass fast and fail slow".
These are prone to breaking when stdr is upgraded because they rely on the exact ordering of keys in the log message. If we have more problems we can rewrite the assertions to be more robust, but for this time I'm just fixing them to match the new output.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Yes, this is a huge commit.
The middleware allows you to customize the API groups of all of the
*.pinniped.dev API groups.
Some notes about other small things in this commit:
- We removed the internal/client package in favor of pkg/conciergeclient. The
two packages do basically the same thing. I don't think we use the former
anymore.
- We re-enabled cluster-scoped owner assertions in the integration tests.
This code was added in internal/ownerref. See a0546942 for when this
assertion was removed.
- Note: the middlware code is in charge of restoring the GV of a request object,
so we should never need to write mutations that do that.
- We updated the supervisor secret generation to no longer manually set an owner
reference to the deployment since the middleware code now does this. I think we
still need some way to make an initial event for the secret generator
controller, which involves knowing the namespace and the name of the generated
secret, so I still wired the deployment through. We could use a namespace/name
tuple here, but I was lazy.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
This change updates our clients to always set an owner ref when:
1. The operation is a create
2. The object does not already have an owner ref set
Signed-off-by: Monis Khan <mok@vmware.com>
- JWKSWriterController
- JWKSObserverController
- FederationDomainSecretsController for HMAC keys
- FederationDomainSecretsController for state signature key
- FederationDomainSecretsController for state encryption key
Signed-off-by: Ryan Richard <richardry@vmware.com>
This implementation is janky because I wanted to make the smallest change
possible to try to get the code back to stable so we can release.
Also deep copy an object so we aren't mutating the cache.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
We believe this API is more forwards compatible with future secrets management
use cases. The implementation is a cry for help, but I was trying to follow the
previously established pattern of encapsulating the secret generation
functionality to a single group of packages.
This commit makes a breaking change to the current OIDCProvider API, but that
OIDCProvider API was added after the latest release, so it is technically still
in development until we release, and therefore we can continue to thrash on it.
I also took this opportunity to make some things private that didn't need to be
public.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This forced us to add labels to the CSRF cookie secret, just as we do
for other Supervisor secrets. Yay tests.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This also sets the CSRF cookie Secret's OwnerReference to the Pod's grandparent
Deployment so that when the Deployment is cleaned up, then the Secret is as
well.
Obviously this controller implementation has a lot of issues, but it will at
least get us started.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
There is still a test failing, but I am sure it is a simple fix hiding in the
code. I think this is the general shape of the controller that we want.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This commit includes a failing test (amongst other compiler failures) for the
dynamic signing key fetcher that we will inject into fosite. We are checking it
in so that we can pass the WIP off.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This allows the token exchange request to be performed with the correct TLS configuration.
We go to a bit of extra work to make sure the `http.Client` object is cached between reconcile operations so that connection pooling works as expected.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Because we want it to implement an AuthcodeExchanger interface and
do it in a way that will be more unit test-friendly than the underlying
library that we intend to use inside its implementation.
This prevents unnecessary sync loop runs when the controller is
running with a single worker. When the controller is running with
more than one worker, it prevents subtle bugs that can cause the
controller to go "back in time."
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Setting a Secret in the supervisor's namespace with a special name
will cause it to get picked up and served as the supervisor's TLS
cert for any request which does not have a matching SNI cert.
- This is especially useful for when there is no DNS record for an
issuer and the user will be accessing it via IP address. This
is not how we would expect it to be used in production, but it
might be useful for other cases.
- Includes a new integration test
- Also suppress all of the warnings about ignoring the error returned by
Close() in lines like `defer x.Close()` to make GoLand happier