Sometimes the container runtime detects the exiting PID 1 very quickly and shuts down the entire container while the `killall` process is still running.
When this happens, we see it as exit code 137 (SIGKILL).
This never failed for me in Kind locally, but fails pretty often in CI (probably due to timing differences).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This causes a CI failure because I modified the generated directory manually. These versions don't really matter because they will be overridden by the parent go.mod file anyway.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change adjusts the kube-cert-agent "deleter" controller to also delete pods that are unusable because they are no longer "Running".
This should make the Concierge recover from scenarios where clusters are suspended and resumed, as well as other edge cases where the `sleep` process in the agent pod exits for some reason.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This required some small adjustments to the produciton code to make it more feasible to test.
The new test takes an existing agent pod and terminates the `sleep` process, causing the pod to go into an `Error` status.
The agent controllers _should_ respond to this by deleting and recreating that failed pod, but the current code just gets stuck.
This is meant to replicate the situation when a cluster is suspended and resumed, which also causes the agent pod to be in this terminal error state.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The group claims read from the session cache file are loaded as `[]interface{}` (slice of empty interfaces) so when we previously did a `groups, _ := idTokenClaims[oidc.DownstreamGroupsClaim].([]string)`, then `groups` would always end up nil.
The solution I tried here was to convert the expected value to also be `[]interface{}` so that `require.Equal(t, ...)` does the right thing.
This bug only showed up in our acceptance environnment against Okta, since we don't have any other integration test coverage with IDPs that pass a groups claim.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We were seeing a race in this test code since the require.NoError() and
require.Eventually() would write to the same testing.T state on separate
goroutines. Hopefully this helper function should cover the cases when we want
to require.NoError() inside a require.Eventually() without causing a race.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Margo Crawford <margaretc@vmware.com>
Co-authored-by: Monis Khan <i@monis.app>
This change updates our clients to always set an owner ref when:
1. The operation is a create
2. The object does not already have an owner ref set
Signed-off-by: Monis Khan <mok@vmware.com>
See comment. This is at least a first step to make our GKE acceptance
environment greener. Previously, this test assumed that the Pinniped-under-test
had been deployed in (roughly) the last 10 minutes, which is not an assumption
that we make anywhere else in the integration test suite.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>