It looks like requests to our aggregated API service on GKE vacillate
between success and failure until they reach a converged successful
state. I think this has to do with our pods updating the API serving
cert at different times. If only one pod updates its serving cert to
the correct value, then it should respond with success. However, the
other pod would respond with failure. Depending on the load balancing
algorithm that GKE uses to send traffic to pods in a service, we could
end up with a success that we interpret as "all pods have rotated
their certs" when it really just means "at least one pod has rotated
its certs."
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Controllers will automatically run again when there's an error,
but when we want to update CredentialIssuerConfig from server.go
we should be careful to retry on conflicts
- Add unit tests for `issuerconfig.CreateOrUpdateCredentialIssuerConfig()`
which was covered by integration tests in previous commits, but not
covered by units tests yet.
So that operators won't look at the lifetime of the CA cert and be
like, "wtf, why does the serving cert have the lifetime that I
specified, but its CA cert is valid for 100 years".
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Upgrade from `1.19.0-rc.0` to the newly-release `1.19.0`.
- Downgrade from `1.18.6` to `1.18.2` to match some downstream consumers.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This should simplify our build/test setup quite a bit, since it means we have only a single module (at the top level) with all hand-written code. I'll leave `module.sh` alone for now but we may be able to simplify that a bit more.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We were using this at one point to control which tests ran with `go test ./...`, but now we're also using the `-short` flag to differentiate unit vs. integration tests.
Hopefully this will simplify things a bit.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
kubectl pulls these in in their main package...I wonder if we should do
the same for our main packages?
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Controller and aggregated API server are allowed to run
- Keep retrying to borrow the cluster signing key in case the failure
to get it was caused by a transient failure
- The CredentialRequest endpoint will always return an authentication
failure as long as the cluster signing key cannot be borrowed
- Update which integration tests are skipped to reflect what should
and should not work based on the cluster's capability under this
new behavior
- Move CreateOrUpdateCredentialIssuerConfig() and related methods
to their own file
- Update the CredentialIssuerConfig's Status every time we try to
refresh the cluster signing key
- Indicate the success or failure of the cluster signing key strategy
- Also introduce the concept of "capabilities" of an integration test
cluster to allow the integration tests to be run against clusters
that do or don't allow the borrowing of the cluster signing key
- Tests that are not expected to pass on clusters that lack the
borrowing of the signing key capability are now ignored by
calling the new library.SkipUnlessClusterHasCapability test helper
- Rename library.Getenv to library.GetEnv
- Add copyrights where they were missing
These never worked quite right, so let's disable them for now: #51
We can probably come up with some better solution now with the new codegen scripts, but I'll leave that for later.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We are seeing between 1 and 2 minutes of difference between the current time
reported in the API server pod and the pinniped pods on one of our testing
environments. Hopefully this change makes our tests pass again.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Didn't fix CI. I didn't think it would.
I have never seen the integration tests fail like this locally, so I
have to imagine the failure has something to do with the environment
on which we are testing.
This reverts commit ba2e2f509a.