In the upstream dynamiccertificates package, we rely on two pieces
of code:
1. DynamicServingCertificateController.newTLSContent which calls
- clientCA.CurrentCABundleContent
- servingCert.CurrentCertKeyContent
2. unionCAContent.VerifyOptions which calls
- unionCAContent.CurrentCABundleContent
This results in calls to our tlsServingCertDynamicCertProvider and
impersonationSigningCertProvider. If we Unset these providers, we
subtly break these consumers. At best this results in test slowness
and flakes while we wait for reconcile loops to converge. At worst,
it results in actual errors during runtime. For example, we
previously would Unset the impersonationSigningCertProvider on any
sync loop error (even a transient one caused by a network blip or
a conflict between writes from different replicas of the concierge).
This would cause us to transiently fail to issue new certificates
from the token credential require API. It would also cause us to
transiently fail to authenticate previously issued client certs
(which results in occasional Unauthorized errors in CI).
Signed-off-by: Monis Khan <mok@vmware.com>
At a high level, it switches us to a distroless base container image, but that also includes several related bits:
- Add a writable /tmp but make the rest of our filesystems read-only at runtime.
- Condense our main server binaries into a single pinniped-server binary. This saves a bunch of space in
the image due to duplicated library code. The correct behavior is dispatched based on `os.Args[0]`, and
the `pinniped-server` binary is symlinked to `pinniped-concierge` and `pinniped-supervisor`.
- Strip debug symbols from our binaries. These aren't really useful in a distroless image anyway and all the
normal stuff you'd expect to work, such as stack traces, still does.
- Add a separate `pinniped-concierge-kube-cert-agent` binary with "sleep" and "print" functionality instead of
using builtin /bin/sleep and /bin/cat for the kube-cert-agent. This is split from the main server binary
because the loading/init time of the main server binary was too large for the tiny resource footprint we
established in our kube-cert-agent PodSpec. Using a separate binary eliminates this issue and the extra
binary adds only around 1.5MiB of image size.
- Switch the kube-cert-agent code to use a JSON `{"tls.crt": "<b64 cert>", "tls.key": "<b64 key>"}` format.
This is more robust to unexpected input formatting than the old code, which simply concatenated the files
with some extra newlines and split on whitespace.
- Update integration tests that made now-invalid assumptions about the `pinniped-server` image.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
this will hopefully fix some flakes where aws provisioned a host for the
load balancer but the tests weren't able to resolve it.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This test did not tolerate this connection failing, which can happen for any number of flaky networking-related reasons. This change moves the connection setup into an "eventually" retry loop so it's allowed to fail temporarily as long as it eventually connects.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This fixes some rare test flakes caused by a data race inherent in the way we use `assert.Eventually()` with extra variables for followup assertions. This function is tricky to use correctly because it runs the passed function in a separate goroutine, and you have no guarantee that any shared variables are in a coherent state when the `assert.Eventually()` call returns. Even if you add manual mutexes, it's tricky to get the semantics right. This has been a recurring pain point and the cause of several test flakes.
This change introduces a new `library.RequireEventually()` that works by internally constructing a per-loop `*require.Assertions` and running everything on a single goroutine (using `wait.PollImmediate()`). This makes it very easy to write eventual assertions.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change updates the impersonator to always authorize every
request instead of relying on the Kuberentes API server to perform
the check on the impersonated request. This protects us from
scenarios where we fail to correctly impersonate the user due to
some bug in our proxy logic. We still rely completely on the API
server to perform admission checks on the impersonated requests.
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the impersonation proxy code to run as a
distinct service account that only has permission to impersonate
identities. Thus any future vulnerability that causes the
impersonation headers to be dropped will fail closed instead of
escalating to the concierge's default service account which has
significantly more permissions.
Signed-off-by: Monis Khan <mok@vmware.com>
When anonymous authentication is disabled, the impersonation proxy
will no longer authenticate anonymous requests other than calls to
the token credential request API (this API is used to retrieve
credentials and thus must be accessed anonymously).
Signed-off-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
Signed-off-by: Monis Khan <mok@vmware.com>
The `require.Eventually()` function runs the body of the check in a separate goroutine, so it's not safe to use other `require` assertions as we did here. Our `library.RequireEventuallyWithoutError()` function does not spawn a goroutine, so it's safer to use here.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test felt overly complex and some of the cleanup logic wasn't 100% correct (it didn't clean up in all cases).
The new code is essentially the same flow but hopefully easier to read.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We see that occasionally kubectl returns 11 lines (probably related to https://github.com/kubernetes/kubernetes/issues/72628).
This test doesn't need to be so picky, so now it allows +/- one line from the expected count.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We had this one test that mutated the CredentialIssuer, which could cause the impersonation proxy to blip on one or both of the running concierge pods. This would sometimes break other concurrently running tests.
Instead, this bit of code is split into a separate non-concurrent test.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test setup should tolerate when the TokenCredentialRequest API isn't quite ready to authenticate the user or issue a cert.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This check is no longer valid, because there can be ephemeral, recoverable errors that show as ErrorDuringSetup.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The admin kubeconfigs we have on EKS clusters are a bit different from others, because there is no certificate/key (EKS does not use certificate auth).
This code didn't quite work correctly in that case. The fix is to allow the case where `tlsConfig.GetClientCertificate` is non-nil, but returns a value with no certificates.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change updates the impersonator logic to pass through requests
that authenticated via a bearer token that asserts a UID. This
allows us to support service account tokens (as well as any other
form of token based authentication).
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the impersonator logic to use the delegated
authorizer for all non-rest verbs such as impersonate. This allows
it to correctly perform authorization checks for incoming requests
that set impersonation headers while not performing unnecessary
checks that are already handled by KAS.
The audit layer is enabled to track the original user who made the
request. This information is then included in a reserved extra
field original-user-info.impersonation-proxy.concierge.pinniped.dev
as a JSON blob.
Signed-off-by: Monis Khan <mok@vmware.com>
- Rename the test/deploy/dex directory to test/deploy/tools
- Rename the dex namespace to tools
- Add a new ytt value called `pinny_ldap_password` for the tools
ytt templates
- This new value is not used on main at this time. We intend to use
it in the forthcoming ldap branch. We're defining it on main so
that the CI scripts can use it across all branches and PRs.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This test could flake if the load balancer hostname was provisioned but is not yet resolving in DNS from the test process.
The fix is to retry this step for up to 5 minutes.
Signed-off-by: Matt Moyer <moyerm@vmware.com>