Highlights from this dep bump:
1. Made a copy of the v0.4.0 github.com/go-logr/stdr implementation
for use in tests. We must bump this dep as Kube code uses a
newer version now. We would have to rewrite hundreds of test log
assertions without this copy.
2. Use github.com/felixge/httpsnoop to undo the changes made by
ory/fosite#636 for CLI based login flows. This is required for
backwards compatibility with older versions of our CLI. A
separate change after this will update the CLI to be more
flexible (it is purposefully not part of this change to confirm
that we did not break anything). For all browser login flows, we
now redirect using http.StatusSeeOther instead of http.StatusFound.
3. Drop plog.RemoveKlogGlobalFlags as klog no longer mutates global
process flags
4. Only bump github.com/ory/x to v0.0.297 instead of the latest
v0.0.321 because v0.0.298+ pulls in a newer version of
go.opentelemetry.io/otel/semconv which breaks k8s.io/apiserver.
We should update k8s.io/apiserver to use the newer code.
5. Migrate all code from k8s.io/apimachinery/pkg/util/clock to
k8s.io/utils/clock and k8s.io/utils/clock/testing
6. Delete testutil.NewDeleteOptionsRecorder and migrate to the new
kubetesting.NewDeleteActionWithOptions
7. Updated ExpectedAuthorizeCodeSessionJSONFromFuzzing caused by
fosite's new rotated_secrets OAuth client field. This new field
is currently not relevant to us as we have no private clients.
Signed-off-by: Monis Khan <mok@vmware.com>
This change fixes a copy paste error that led to the impersonation
proxy signer CA being rotated based on the configuration of the
rotation of the aggregated API serving certificate. This would lead
to occasional "Unauthorized" flakes in our CI environments that
rotate the serving certificate at a frequent interval.
Updated the certs_expirer controller logs to be more detailed.
Updated CA common names to be more specific (this does not update
any previously generated CAs).
Signed-off-by: Monis Khan <mok@vmware.com>
In the upstream dynamiccertificates package, we rely on two pieces
of code:
1. DynamicServingCertificateController.newTLSContent which calls
- clientCA.CurrentCABundleContent
- servingCert.CurrentCertKeyContent
2. unionCAContent.VerifyOptions which calls
- unionCAContent.CurrentCABundleContent
This results in calls to our tlsServingCertDynamicCertProvider and
impersonationSigningCertProvider. If we Unset these providers, we
subtly break these consumers. At best this results in test slowness
and flakes while we wait for reconcile loops to converge. At worst,
it results in actual errors during runtime. For example, we
previously would Unset the impersonationSigningCertProvider on any
sync loop error (even a transient one caused by a network blip or
a conflict between writes from different replicas of the concierge).
This would cause us to transiently fail to issue new certificates
from the token credential require API. It would also cause us to
transiently fail to authenticate previously issued client certs
(which results in occasional Unauthorized errors in CI).
Signed-off-by: Monis Khan <mok@vmware.com>
This change fixes a race that can occur because we have multiple
writers with no leader election lock.
1. TestAPIServingCertificateAutoCreationAndRotation/automatic
expires the current serving certificate
2. CertsExpirerController 1 deletes expired serving certificate
3. CertsExpirerController 2 starts deletion of expired serving
certificate but has not done so yet
4. CertsManagerController 1 creates new serving certificate
5. TestAPIServingCertificateAutoCreationAndRotation/automatic
records the new serving certificate
6. CertsExpirerController 2 finishes deletion, and thus deletes the
newly created serving certificate instead of the old one
7. CertsManagerController 2 creates new serving certificate
8. TestAPIServingCertificateAutoCreationAndRotation/automatic keeps
running and eventually times out because it is expecting the
serving certificate created by CertsManagerController 2 to match
the value it recorded from CertsManagerController 1 (which will
never happen since that certificate was incorrectly deleted).
Signed-off-by: Monis Khan <mok@vmware.com>
We were previously issuing both client certs and server certs with
both extended key usages included. Split the Issue*() methods into
separate methods for issuing server certs versus client certs so
they can have different extended key usages tailored for each use
case.
Also took the opportunity to clean up the parameters of the Issue*()
methods and New() methods to more closely match how we prefer to call
them. We were always only passing the common name part of the
pkix.Name to New(), so now the New() method just takes the common name
as a string. When making a server cert, we don't need to set the
deprecated common name field, so remove that param. When making a client
cert, we're always making it in the format expected by the Kube API
server, so just accept the username and group as parameters directly.
To make an impersonation request, first make a TokenCredentialRequest
to get a certificate. That cert will either be issued by the Kube
API server's CA or by a new CA specific to the impersonator. Either
way, you can then make a request to the impersonator and present
that client cert for auth and the impersonator will accept it and
make the impesonation call on your behalf.
The impersonator http handler now borrows some Kube library code
to handle request processing. This will allow us to more closely
mimic the behavior of a real API server, e.g. the client cert
auth will work exactly like the real API server.
Signed-off-by: Monis Khan <mok@vmware.com>
All controller unit tests were accidentally using a timeout context
for the informers, instead of a cancel context which stays alive until
each test is completely finished. There is no reason to risk
unpredictable behavior of a timeout being reached during an individual
test, even though with the previous 3 second timeout it could only be
reached on a machine which is running orders of magnitude slower than
usual, since each test usually runs in about 100-300 ms. Unfortunately,
sometimes our CI workers might get that slow.
This sparked a review of other usages of timeout contexts in other
tests, and all of them were increased to a minimum value of 1 minute,
under the rule of thumb that our tests will be more reliable on slow
machines if they "pass fast and fail slow".
This is the first of a few related changes that re-organize our API after the big recent changes that introduced the supervisor component.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Setting a Secret in the supervisor's namespace with a special name
will cause it to get picked up and served as the supervisor's TLS
cert for any request which does not have a matching SNI cert.
- This is especially useful for when there is no DNS record for an
issuer and the user will be accessing it via IP address. This
is not how we would expect it to be used in production, but it
might be useful for other cases.
- Includes a new integration test
- Also suppress all of the warnings about ignoring the error returned by
Close() in lines like `defer x.Close()` to make GoLand happier
This should fix integration tests running on clusters that don't have
visible controller manager pods (e.g., GKE). Pinniped should boot, not
find any controller manager pods, but still post a status in the CIC.
I also updated a test helper so that we could tell the difference
between when an event was not added and when an event was added with
an empty key.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
3 main reasons:
- The cert and key that we store in this object are not always used for TLS.
- The package name "provider" was a little too generic.
- dynamiccert.Provider reads more go-ish than provider.DynamicCertProvider.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
New resource naming conventions:
- Do not repeat the Kind in the name,
e.g. do not call it foo-cluster-role-binding, just call it foo
- Names will generally start with a prefix to identify our component,
so when a user lists all objects of that kind, they can tell to which
component it is related,
e.g. `kubectl get configmaps` would list one named "pinniped-config"
- It should be possible for an operator to make the word "pinniped"
mostly disappear if they choose, by specifying the app_name in
values.yaml, to the extent that is practical (but not from APIService
names because those are hardcoded in golang)
- Each role/clusterrole and its corresponding binding have the same name
- Pinniped resource names that must be known by the server golang code
are passed to the code at run time via ConfigMap, rather than
hardcoded in the golang code. This also allows them to be prepended
with the app_name from values.yaml while creating the ConfigMap.
- Since the CLI `get-kubeconfig` command cannot guess the name of the
CredentialIssuerConfig resource in advance anymore, it lists all
CredentialIssuerConfig in the app's namespace and returns an error
if there is not exactly one found, and then uses that one regardless
of its name
- The certs manager controller, along with its sibling certs expirer
and certs observer controllers, are generally useful for any process
that wants to create its own CA and TLS certs, but only if the
updating of the APIService is not included in those controllers
- So that functionality for updating APIServices is moved to a new
controller which watches the same Secret which is used by those
other controllers
- Also parameterize `NewCertsManagerController` with the service name
and the CA common name to make the controller more reusable
When we use RSA private keys to sign our test certificates, we run
into strange test timeouts. The internal/controller/apicerts package
was timing out on my machine more than once every 3 runs. When I
changed the RSA crypto to EC crypto, this timeout goes away. I'm not
gonna try to figure out what the deal is here because I think it would
take longer than it would be worth (although I am sure it is some fun
story involving prime numbers; the goroutine traces for timed out
tests would always include some big.Int operations involving prime
numbers...).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Controllers will automatically run again when there's an error,
but when we want to update CredentialIssuerConfig from server.go
we should be careful to retry on conflicts
- Add unit tests for `issuerconfig.CreateOrUpdateCredentialIssuerConfig()`
which was covered by integration tests in previous commits, but not
covered by units tests yet.
So that operators won't look at the lifetime of the CA cert and be
like, "wtf, why does the serving cert have the lifetime that I
specified, but its CA cert is valid for 100 years".
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
We are seeing between 1 and 2 minutes of difference between the current time
reported in the API server pod and the pinniped pods on one of our testing
environments. Hopefully this change makes our tests pass again.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
These configuration knobs are much more human-understandable than the
previous percentage-based threshold flag.
We now allow users to set the lifetime of the serving cert via a ConfigMap.
Previously this was hardcoded to 1 year.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The rotation is forced by a new controller that deletes the serving cert
secret, as other controllers will see this deletion and ensure that a new
serving cert is created.
Note that the integration tests now have an addition worst case runtime of
60 seconds. This is because of the way that the aggregated API server code
reloads certificates. We will fix this in a future story. Then, the
integration tests should hopefully get much faster.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This switches us back to an approach where we use the Pod "exec" API to grab the keys we need, rather than forcing our code to run on the control plane node. It will help us fail gracefully (or dynamically switch to alternate implementations) when the cluster is not self-hosted.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
- Add integration test for serving cert auto-generation and rotation
- Add unit test for `WithInitialEvent` of the cert manager controller
- Move UpdateAPIService() into the `apicerts` package, since that is
the only user of the function.
- Add a unit test for each cert controller
- Make DynamicTLSServingCertProvider an interface and use a mutex
internally
- Create a shared ToPEM function instead of having two very similar
functions
- Move the ObservableWithInformerOption test helper to testutils
- Rename some variables and imports
- Refactors the existing cert generation code into controllers
which read and write a Secret containing the certs
- Does not add any new functionality yet, e.g. no new handling
for cert expiration, and no leader election to allow for
multiple servers running simultaneously
- This commit also doesn't add new tests for the cert generation
code, but it should be more unit testable now as controllers