All controller unit tests were accidentally using a timeout context
for the informers, instead of a cancel context which stays alive until
each test is completely finished. There is no reason to risk
unpredictable behavior of a timeout being reached during an individual
test, even though with the previous 3 second timeout it could only be
reached on a machine which is running orders of magnitude slower than
usual, since each test usually runs in about 100-300 ms. Unfortunately,
sometimes our CI workers might get that slow.
This sparked a review of other usages of timeout contexts in other
tests, and all of them were increased to a minimum value of 1 minute,
under the rule of thumb that our tests will be more reliable on slow
machines if they "pass fast and fail slow".
In impersonator_config_test.go, instead of waiting for the resource
version to appear in the informers, wait for the actual object to
appear.
This is an attempt to resolve flaky failures that only happen in CI,
but it also cleans up the test a bit by avoiding inventing fake resource
version numbers all over the test.
Signed-off-by: Monis Khan <mok@vmware.com>
- Use `Eventually` when making tls connections because the production
code's handling of starting and stopping the TLS server port
has some async behavior.
- Don't use resource version "0" because that has special meaning
in the informer libraries.
This updates our issuerconfig.UpdateStrategy to sort strategies according to a weighted preference.
The TokenCredentialRequest API strategy is preffered, followed by impersonation proxy, followed by any other unknown types.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- This commit does not include the updates that we plan to make to
the `status.strategies[].frontend` field of the CredentialIssuer.
That will come in a future commit.
This is more than an automatic merge. It also includes a rewrite of the CredentialIssuer API impersonation proxy fields using the new structure, and updates to the CLI to account for that new API.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These controllers were a bit inconsistent. There were cases where the controllers ran out of the expected order and the custom labels might not have been applied.
We should still plan to remove this label handling or move responsibility into the middleware layer, but this avoids any regression.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- The CA cert will end up in the end user's kubeconfig on their client
machine, so if it changes they would need to fetch the new one and
update their kubeconfig. Therefore, we should avoid changing it as
much as possible.
- Now the controller writes the CA to a different Secret. It writes both
the cert and the key so it can reuse them to create more TLS
certificates in the future.
- For now, it only needs to make more TLS certificates if the old
TLS cert Secret gets deleted or updated to be invalid. This allows
for manual rotation of the TLS certs by simply deleting the Secret.
In the future, we may want to implement some kind of auto rotation.
- For now, rotation of both the CA and TLS certs will also happen if
you manually delete the CA Secret. However, this would cause the end
users to immediately need to get the new CA into their kubeconfig,
so this is not as elegant as a normal rotation flow where you would
have a window of time where you have more than one CA.
Also update concierge_impersonation_proxy_test.go integration test
to use real TLS when calling the impersonator.
Signed-off-by: Ryan Richard <richardry@vmware.com>
These are prone to breaking when stdr is upgraded because they rely on the exact ordering of keys in the log message. If we have more problems we can rewrite the assertions to be more robust, but for this time I'm just fixing them to match the new output.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a more reliable way to determine whether the load balancer
is already running.
Also added more unit tests for the load balancer.
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Watch a configmap to read the configuration of the impersonation
proxy and reconcile it.
- Implements "auto" mode by querying the API for control plane nodes.
- WIP: does not create a load balancer or proper TLS certificates yet.
Those will come in future commits.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
When the Pinniped server has been installed with the `api_group_suffix`
option, for example using `mysuffix.com`, then clients who would like to
submit a `TokenCredentialRequest` to the server should set the
`Spec.Authenticator.APIGroup` field as `authentication.concierge.mysuffix.com`.
This makes more sense from the client's point of view than using the
default `authentication.concierge.pinniped.dev` because
`authentication.concierge.mysuffix.com` is the name of the API group
that they can observe their cluster and `authentication.concierge.pinniped.dev`
does not exist as an API group on their cluster.
This commit includes both the client and server-side changes to make
this work, as well as integration test updates.
Co-authored-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Margo Crawford <margaretc@vmware.com>
Yes, this is a huge commit.
The middleware allows you to customize the API groups of all of the
*.pinniped.dev API groups.
Some notes about other small things in this commit:
- We removed the internal/client package in favor of pkg/conciergeclient. The
two packages do basically the same thing. I don't think we use the former
anymore.
- We re-enabled cluster-scoped owner assertions in the integration tests.
This code was added in internal/ownerref. See a0546942 for when this
assertion was removed.
- Note: the middlware code is in charge of restoring the GV of a request object,
so we should never need to write mutations that do that.
- We updated the supervisor secret generation to no longer manually set an owner
reference to the deployment since the middleware code now does this. I think we
still need some way to make an initial event for the secret generator
controller, which involves knowing the namespace and the name of the generated
secret, so I still wired the deployment through. We could use a namespace/name
tuple here, but I was lazy.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
This change updates our clients to always set an owner ref when:
1. The operation is a create
2. The object does not already have an owner ref set
Signed-off-by: Monis Khan <mok@vmware.com>
- JWKSWriterController
- JWKSObserverController
- FederationDomainSecretsController for HMAC keys
- FederationDomainSecretsController for state signature key
- FederationDomainSecretsController for state encryption key
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Only sync on add/update of secrets in the same namespace which
have the "storage.pinniped.dev/garbage-collect-after" annotation, and
also during a full resync of the informer whenever secrets in the
same namespace with that annotation exist.
- Ignore deleted secrets to avoid having this controller trigger itself
unnecessarily when it deletes a secret. This controller is never
interested in deleted secrets, since its only job is to delete
existing secrets.
- No change to the self-imposed rate limit logic. That still applies
because secrets with this annotation will be created and updated
regularly while the system is running (not just during rare system
configuration steps).
This implementation is janky because I wanted to make the smallest change
possible to try to get the code back to stable so we can release.
Also deep copy an object so we aren't mutating the cache.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Because the library that we are using which returns that error
formats the timestamp in localtime, which is LMT when running
on a laptop, but is UTC when running in CI.
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Refactor the test to avoid testing a private method and instead
always test the results of running the controller.
- Also remove the `if testing.Short()` check because it will always
be short when running unit tests. This prevented the unit test
from ever running, both locally and in CI.
Signed-off-by: Ryan Richard <richardry@vmware.com>
We believe this API is more forwards compatible with future secrets management
use cases. The implementation is a cry for help, but I was trying to follow the
previously established pattern of encapsulating the secret generation
functionality to a single group of packages.
This commit makes a breaking change to the current OIDCProvider API, but that
OIDCProvider API was added after the latest release, so it is technically still
in development until we release, and therefore we can continue to thrash on it.
I also took this opportunity to make some things private that didn't need to be
public.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This forced us to add labels to the CSRF cookie secret, just as we do
for other Supervisor secrets. Yay tests.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Also add more log statements to the controller
- Also have the controller apply a rate limit to itself, to avoid
having a very chatty controller that runs way more often than is
needed.
- Also add an integration test for the controller's behavior.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This also sets the CSRF cookie Secret's OwnerReference to the Pod's grandparent
Deployment so that when the Deployment is cleaned up, then the Secret is as
well.
Obviously this controller implementation has a lot of issues, but it will at
least get us started.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
There is still a test failing, but I am sure it is a simple fix hiding in the
code. I think this is the general shape of the controller that we want.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This commit includes a failing test (amongst other compiler failures) for the
dynamic signing key fetcher that we will inject into fosite. We are checking it
in so that we can pass the WIP off.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This allows the token exchange request to be performed with the correct TLS configuration.
We go to a bit of extra work to make sure the `http.Client` object is cached between reconcile operations so that connection pooling works as expected.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Because we want it to implement an AuthcodeExchanger interface and
do it in a way that will be more unit test-friendly than the underlying
library that we intend to use inside its implementation.
This prevents unnecessary sync loop runs when the controller is
running with a single worker. When the controller is running with
more than one worker, it prevents subtle bugs that can cause the
controller to go "back in time."
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Matt Moyer <moyerm@vmware.com>
And delete the agent pod when it needs its custom labels to be
updated, so that the creator controller will notice that it is missing
and immediately create it with the new custom labels.
This is the first of a few related changes that re-organize our API after the big recent changes that introduced the supervisor component.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Setting a Secret in the supervisor's namespace with a special name
will cause it to get picked up and served as the supervisor's TLS
cert for any request which does not have a matching SNI cert.
- This is especially useful for when there is no DNS record for an
issuer and the user will be accessing it via IP address. This
is not how we would expect it to be used in production, but it
might be useful for other cases.
- Includes a new integration test
- Also suppress all of the warnings about ignoring the error returned by
Close() in lines like `defer x.Close()` to make GoLand happier
- TLS certificates can be configured on the OIDCProviderConfig using
the `secretName` field.
- When listening for incoming TLS connections, choose the TLS cert
based on the SNI hostname of the incoming request.
- Because SNI hostname information on incoming requests does not include
the port number of the request, we add a validation that
OIDCProviderConfigs where the issuer hostnames (not including port
number) are the same must use the same `secretName`.
- Note that this approach does not yet support requests made to an
IP address instead of a hostname. Also note that `localhost` is
considered a hostname by SNI.
- Add port 443 as a container port to the pod spec.
- A new controller watches for TLS secrets and caches them in memory.
That same in-memory cache is used while servicing incoming connections
on the TLS port.
- Make it easy to configure both port 443 and/or port 80 for various
Service types using our ytt templates for the supervisor.
- When deploying to kind, add another nodeport and forward it to the
host on another port to expose our new HTTPS supervisor port to the
host.
- When two different Issuers have the same host (i.e. they differ
only by path) then they must have the same secretName. This is because
it wouldn't make sense for there to be two different TLS certificates
for one host. Find any that do not have the same secret name to
put an error status on them and to avoid serving OIDC endpoints for
them. The host comparison is case-insensitive.
- Issuer hostnames should be treated as case-insensitive, because
DNS hostnames are case-insensitive. So https://me.com and
https://mE.cOm are duplicate issuers. However, paths are
case-sensitive, so https://me.com/A and https://me.com/a are
different issuers. Fixed this in the issuer validations and in the
OIDC Manager's request router logic.
EC keys are smaller and take less time to generate. Our integration
tests were super flakey because generating an RSA key would take up to
10 seconds *gasp*. The main token verifier that we care about is
Kubernetes, which supports P256, so hopefully it won't be that much of
an issue that our default signing key type is EC. The OIDC spec seems
kinda squirmy when it comes to using non-RSA signing algorithms...
Signed-off-by: Andrew Keesler <akeesler@vmware.com>