this will hopefully fix some flakes where aws provisioned a host for the
load balancer but the tests weren't able to resolve it.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
TestAgentController really runs the controller and evaluates multiple
calls to the controller's Sync with real informers caching updates.
There is a large amount of non-determinism in this unit test, and it
does not always behave the same way. Because it makes assertions about
the specific errors that should be returned by Sync, it was not
accounting for some errors that are only returned by Sync once in a
while depending on the exact (unpredictable) order of operations.
This commit doesn't fix the non-determinism in the test, but rather
tries to work around it by also allowing other (undesired but
inevitable) error messages to appear in the list of actual error
messages returned by the calls to the Sync function.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
Previously, the ytt install docs suggested that you use ytt templates
from the HEAD of main with the container image from the latest public
release, which could result in a mismatch.
It seems like page.ClearCookies() only clears cookies for the current
domain, so there doesn't seem to be a function to clear all browser
cookies. Instead, we'll just start a whole new browser each test.
They start fast enough that it shouldn't be a problem.
Our actual CLI code behaved correctly, but this test made some invalid assumptions about the "upstream" IDP we're testing. It assumed that the upstream didn't support `response_mode=form_post`, but Okta does. This means that when we end up on the localhost callback page, there are no URL query parameters.
Adjusting this regex makes the test pass as expected.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I found that there are some situations with `response_mode=form_post` where Chrome will open additional speculative TCP connections. These connections will be idle so they block server shutdown until the (previously 5s) timeout. Lowering this to 500ms should be safe and makes any added latency at login much less noticeable.
More information about Chrome's TCP-level behavior here: https://bugs.chromium.org/p/chromium/issues/detail?id=116982#c5
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Using the same fake TTY trick we used to test LDAP login, this new subtest runs through the "manual"/"jump box" login flow. It runs the login with a `--skip-listen` flag set, causing the CLI to skip opening the localhost listener. We can then wait for the login URL to be printed, visit it with the browser and log in, and finally simulate "manually" copying the auth code from the browser and entering it into the waiting CLI prompt.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This flag is (for now) meant only to facilitate end-to-end testing, allowing us to force the "manual" login flow. If it ends up being useful we can un-hide it, but this seemed like the safest option to start with.
There is also a corresponding `--oidc-skip-listen` on the `pinniped get kubeconfig` command.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
For some reason our headless Chrome test setup behaves slightly differently on Linux and macOS hosts. On Linux, the emoji characters are not recognized as valid text, so they are URL encoded. This change updates the test to cope with both cases correctly.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This adds a new login flow that allows manually pasting the authorization code instead of receiving a browser-based callback.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a more restrictive library interface that more closely matches the use cases of our new form_post login flow.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This allows the prompts to be cancelled, which we need to be able to do in the case where we prompt for a manually-pasted auth code but the automatic callback succeeds.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a new pacakge internal/oidc/provider/formposthtml containing a number of static files embedded using the relatively recent Go "//go:embed" functionality introduced in Go 1.16 (https://blog.golang.org/go1.16).
The Javascript and CSS files are minifiied and injected to make a single self-contained HTML response. There is a special Content-Security-Policy helper to calculate hash-based script-src and style-src rules.
This new code is covered by a new integration test that exercises the JS/HTML functionality in a real browser outside of the rest of the Supervisor.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Our Supervisor callback handler now needs to load JS and CSS from the provider endpoint, and this JS needs to make a `fetch()` call across origins (to post the form to the CLI callback). This requires a custom Content-Security-Policy compared to other pages we render.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test would occasionally flake for me when running locally. This change moves more of the assertions into the "eventually" loop, so they can temporarily fail as long as they converge on the expected values.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test did not tolerate this connection failing, which can happen for any number of flaky networking-related reasons. This change moves the connection setup into an "eventually" retry loop so it's allowed to fail temporarily as long as it eventually connects.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This CredentialIssuer field is called `spec.impersonationProxy.service.type`, not `spec.impersonationProxy.service.mode`.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
TestSimultaneousLDAPRequestsOnSingleProvider proved to be unreliable
on AKS due to some kind of kubectl port-forward issue, so only
run the LDAP client's integration tests on Kind. They are testing
the integration between the client code and the OpenLDAP test server,
not testing anything about Kubernetes, so running only on Kind should
give us sufficient test coverage.
After noticing that the upstream OIDC discovery calls can hang
indefinitely, I had tried to impose a one minute timeout on them
by giving them a timeout context. However, I hadn't noticed that the
context also gets passed into the JWKS fetching object, which gets
added to our cache and used later. Therefore the timeout context
was added to the cache and timed out while sitting in the cache,
causing later JWKS fetchers to fail.
This commit is trying again to impose a reasonable timeout on these
discovery and JWKS calls, but this time by using http.Client's Timeout
field, which is documented to be a timeout for *each* request/response
cycle, so hopefully this is a more appropriate way to impose a timeout
for this use case. The http.Client instance ends up in the cache on
the JWKS fetcher object, so the timeout should apply to each JWKS
request as well.
Requests that can hang forever are effectively a server-side resource
leak, which could theoretically be taken advantage of in a denial of
service attempt, so it would be nice to avoid having them.