When we use RSA private keys to sign our test certificates, we run
into strange test timeouts. The internal/controller/apicerts package
was timing out on my machine more than once every 3 runs. When I
changed the RSA crypto to EC crypto, this timeout goes away. I'm not
gonna try to figure out what the deal is here because I think it would
take longer than it would be worth (although I am sure it is some fun
story involving prime numbers; the goroutine traces for timed out
tests would always include some big.Int operations involving prime
numbers...).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Controllers will automatically run again when there's an error,
but when we want to update CredentialIssuerConfig from server.go
we should be careful to retry on conflicts
- Add unit tests for `issuerconfig.CreateOrUpdateCredentialIssuerConfig()`
which was covered by integration tests in previous commits, but not
covered by units tests yet.
So that operators won't look at the lifetime of the CA cert and be
like, "wtf, why does the serving cert have the lifetime that I
specified, but its CA cert is valid for 100 years".
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This should simplify our build/test setup quite a bit, since it means we have only a single module (at the top level) with all hand-written code. I'll leave `module.sh` alone for now but we may be able to simplify that a bit more.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Controller and aggregated API server are allowed to run
- Keep retrying to borrow the cluster signing key in case the failure
to get it was caused by a transient failure
- The CredentialRequest endpoint will always return an authentication
failure as long as the cluster signing key cannot be borrowed
- Update which integration tests are skipped to reflect what should
and should not work based on the cluster's capability under this
new behavior
- Move CreateOrUpdateCredentialIssuerConfig() and related methods
to their own file
- Update the CredentialIssuerConfig's Status every time we try to
refresh the cluster signing key
- Indicate the success or failure of the cluster signing key strategy
- Also introduce the concept of "capabilities" of an integration test
cluster to allow the integration tests to be run against clusters
that do or don't allow the borrowing of the cluster signing key
- Tests that are not expected to pass on clusters that lack the
borrowing of the signing key capability are now ignored by
calling the new library.SkipUnlessClusterHasCapability test helper
- Rename library.Getenv to library.GetEnv
- Add copyrights where they were missing
We are seeing between 1 and 2 minutes of difference between the current time
reported in the API server pod and the pinniped pods on one of our testing
environments. Hopefully this change makes our tests pass again.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
These configuration knobs are much more human-understandable than the
previous percentage-based threshold flag.
We now allow users to set the lifetime of the serving cert via a ConfigMap.
Previously this was hardcoded to 1 year.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
We need a way to validate that this generated code is up to date. I added
a long-term engineering TODO for this.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The rotation is forced by a new controller that deletes the serving cert
secret, as other controllers will see this deletion and ensure that a new
serving cert is created.
Note that the integration tests now have an addition worst case runtime of
60 seconds. This is because of the way that the aggregated API server code
reloads certificates. We will fix this in a future story. Then, the
integration tests should hopefully get much faster.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This switches us back to an approach where we use the Pod "exec" API to grab the keys we need, rather than forcing our code to run on the control plane node. It will help us fail gracefully (or dynamically switch to alternate implementations) when the cluster is not self-hosted.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
- We want to follow the <noun>Request convention.
- The actual operation does not login a user, but it does retrieve a
credential with which they can login.
- This commit includes changes to all LoginRequest-related symbols and
constants to try to update their names to follow the new
CredentialRequest type.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
As discussed in API review, this field exists for convenience right
now. Since the username/groups are encoded in the Credential sent in
the LoginRequestStatus, the client still has access to their
user/groups information. We want to remove this for now to be
conservative and limit our API surface area (smaller surface area =
less to maintain). We can always add this back in the future.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Add integration test for serving cert auto-generation and rotation
- Add unit test for `WithInitialEvent` of the cert manager controller
- Move UpdateAPIService() into the `apicerts` package, since that is
the only user of the function.
- Add a unit test for each cert controller
- Make DynamicTLSServingCertProvider an interface and use a mutex
internally
- Create a shared ToPEM function instead of having two very similar
functions
- Move the ObservableWithInformerOption test helper to testutils
- Rename some variables and imports
- Refactors the existing cert generation code into controllers
which read and write a Secret containing the certs
- Does not add any new functionality yet, e.g. no new handling
for cert expiration, and no leader election to allow for
multiple servers running simultaneously
- This commit also doesn't add new tests for the cert generation
code, but it should be more unit testable now as controllers
- No functional changes
- Move all the stuff about clients and controllers into the controller
package
- Add more comments and organize the code more into more helper
functions to make each function smaller
- Previously the golang code would create a Service and an APIService.
The APIService would be given an owner reference which pointed to
the namespace in which the app was installed.
- This prevented the app from being uninstalled. The namespace would
refuse to delete, so `kapp delete` or `kubectl delete` would fail.
- The new approach is to statically define the Service and an APIService
in the deployment.yaml, except for the caBundle of the APIService.
Then the golang code will perform an update to add the caBundle at
runtime.
- When the user uses `kapp deploy` or `kubectl apply` either tool will
notice that the caBundle is not declared in the yaml and will
therefore avoid editing that field.
- When the user uses `kapp delete` or `kubectl delete` either tool will
destroy the objects because they are statically declared with names
in the yaml, just like all of the other objects. There are no
ownerReferences used, so nothing should prevent the namespace from
being deleted.
- This approach also allows us to have less golang code to maintain.
- In the future, if our golang controllers want to dynamically add
an Ingress or other objects, they can still do that. An Ingress
would point to our statically defined Service as its backend.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Seems like the next step is to allow override of the CA bundle; I didn't
do that here for simplicity of the commit, but seems like it is the right
thing to do in the future.
- Also includes bumping the api and client-go dependencies to the newer
version which also moved LoginDiscoveryConfig to the
crds.placeholder.suzerain-io.github.io group in the generated code