Also:
- Make our code generator script work with Go 1.17
- Make our update.sh script work on linux
- Update the patch versions of the old Kube versions that we were using
to generate code (see kube-versions.txt)
- Use our container images from ghcr instead of
projects.registry.vmware.com for codegen purposes
- Make it easier to debug in the future by passing "-v" to the Kube
codegen scripts
- Updated copyright years to make commit checks pass
The purpose of this change is to allow Helm to be used to deploy Pinniped
into the local KinD cluster for the local integration tests. That said,
the change allows any alternate deployment mechanism, I just happen
to be using it with Helm.
All default behavior is preserved. This won't change how anyone uses the
script today, it just allows me not to copy/paste the whole setup for the
integration tests.
Changes:
1) An option called `--alternate-deploy <path-to-deploy-script>` has been
added, that when enabled calls the specified script instead of using ytt
and kapp. The alternate deploy script is called with the app to deploy
and the tag of the docker image to use. We set the default value of
the alternate_deploy variable to undefined, and there is a check that
tests if the alternate deploy is defined. For the superivsor it looks
like this:
```
if [ "$alternate_deploy" != "undefined" ]; then
log_note "The Pinniped Supervisor will be deployed with $alternate_deploy pinniped-supervisor $tag..."
$alternate_deploy pinniped-supervisor $tag
else
normal ytt/kapp deploy
fi
```
2) Additional log_note entries have been added to enumerate all values passed
into the ytt/kapp deploy. Used while I was trying to reach parity in the integration
tests, but I think they are useful for debugging.
3) The manifests produced by ytt and written to /tmp are now named individually.
This is so an easy comparison can be made between manifests produced by a ytt/kapp
run of integration tests and manifests produced by helm run of the integration tests.
If something is not working I have been comparing the manifests after these runs to
find differences.
This change allows configuration of the http and https listeners
used by the supervisor.
TCP (IPv4 and IPv6 with any interface and port) and Unix domain
socket based listeners are supported. Listeners may also be
disabled.
Binding the http listener to TCP addresses other than 127.0.0.1 or
::1 is deprecated.
The deployment now uses https health checks. The supervisor is
always able to complete a TLS connection with the use of a bootstrap
certificate that is signed by an in-memory certificate authority.
To support sidecar containers used by service meshes, Unix domain
socket based listeners include ACLs that allow writes to the socket
file from any runAsUser specified in the pod's containers.
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the TLS config used by all pinniped components.
There are no configuration knobs associated with this change. Thus
this change tightens our static defaults.
There are four TLS config levels:
1. Secure (TLS 1.3 only)
2. Default (TLS 1.2+ best ciphers that are well supported)
3. Default LDAP (TLS 1.2+ with less good ciphers)
4. Legacy (currently unused, TLS 1.2+ with all non-broken ciphers)
Highlights per component:
1. pinniped CLI
- uses "secure" config against KAS
- uses "default" for all other connections
2. concierge
- uses "secure" config as an aggregated API server
- uses "default" config as a impersonation proxy API server
- uses "secure" config against KAS
- uses "default" config for JWT authenticater (mostly, see code)
- no changes to webhook authenticater (see code)
3. supervisor
- uses "default" config as a server
- uses "secure" config against KAS
- uses "default" config against OIDC IDPs
- uses "default LDAP" config against LDAP IDPs
Signed-off-by: Monis Khan <mok@vmware.com>
This should make it easier for us to to notice if something is wrong
with our service (especially in any future kubectl tests we add).
Signed-off-by: Monis Khan <mok@vmware.com>
Those images that are pulled from Dockerhub will cause pull failures
on some test clusters due to Dockerhub rate limiting.
Because we already have some images that we use for testing, and
because those images are already pre-loaded onto our CI clusters
to make the tests faster, use one of those images and always specify
PullIfNotPresent to avoid pulling the image again during the integration
test.
This required a weird hack because some of the Fosite tests (or a transitive dependency of them) depends on a newer version of gRPC that's incompatible with the Kubernetes runtime version we use. It wasn't as simple as just replacing the gRPC module with an older version, because in the latest versions of gRPC, they split out the "examples" packages into their own module. This new module name doesn't exist at the old version.
Ultimately, the workaround was to make a fake "examples" module locally. This module can be empty because we never actually depend on that code (it's only used in transitive dependency tests).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- The linux base64 command is different, so avoid using it at all.
On linux the default is to split the output into multiple lines,
which messes up the integration-test-env file. The flag used to
disable this behavior on linux ("-w0") does not exist on MacOS's
base64.
- On debian linux, the latest version of Docker from apt-get still
requires DOCKER_BUILDKIT=1 or else it barfs.
$PINNIPED_TEST_SUPERVISOR_UPSTREAM_OIDC_ISSUER_CA_BUNDLE was recently
changed to be a base64 encoded value, so this script does not need to
base64 encode the value itself anymore.
Avoid them because they can't be used in GoLand for running integration
tests in the UI, like running in the debugger.
Also adds optional PINNIPED_TEST_TOOLS_NAMESPACE because we need it
on the LDAP feature branch where we are developing the upcoming LDAP
support for the Supervisor.
- Rename the test/deploy/dex directory to test/deploy/tools
- Rename the dex namespace to tools
- Add a new ytt value called `pinny_ldap_password` for the tools
ytt templates
- This new value is not used on main at this time. We intend to use
it in the forthcoming ldap branch. We're defining it on main so
that the CI scripts can use it across all branches and PRs.
Signed-off-by: Ryan Richard <richardry@vmware.com>
A demo of running the Supervisor and Concierge on
a kind cluster. Can be used to quickly set up an
environment for manual testing.
Also added some missing copyright headers to other
hack scripts.
It takes a lot of manual steps to get ready to manually test the
impersonation proxy on a kind cluster, which makes it error prone,
so encapsulate them into a script to make it easier.
It also works on the slightly older MacOS Catalina.
This script is only used on development laptops, so hopefully
this will work for more laptop OS's now.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Because otherwise `go test` will panic/crash your test if it takes
longer than 10 minutes, which is an annoying way for an integration
test to fail since it skips all of the t.Cleanup's.
This change adds a new virtual aggregated API that can be used by
any user to echo back who they are currently authenticated as. This
has general utility to end users and can be used in tests to
validate if authentication was successful.
Signed-off-by: Monis Khan <mok@vmware.com>
Makes it easy to deploy Pinniped under a different API group for manual
testing and iterating on integration tests on your laptop.
Signed-off-by: Ryan Richard <richardry@vmware.com>
We need this in CI when we want to configure Dex with the redirect URI for both
primary and secondary deploys at one time (since we only stand up Dex once).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Previously, when triggering a Tilt reload via a *.go file change, a reload would
take ~13 seconds and we would see this error message in the Tilt logs for each
component.
Live Update failed with unexpected error:
command terminated with exit code 2
Falling back to a full image build + deploy
Now, Tilt should reload images a lot faster (~3 seconds) since we are running
the images as root.
Note! Reloading the Concierge component still takes ~13 seconds because there
are 2 containers running in the Concierge namespace that use the Concierge
image: the main Concierge app and the kube cert agent pod. Tilt can't live
reload both of these at once, so the reload takes longer and we see this error
message.
Will not perform Live Update because:
Error retrieving container info: can only get container info for a single pod; image target image:image/concierge has 2 pods
Falling back to a full image build + deploy
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
kubectl 1.20 prints "Kubernetes control plane" instead of "Kubernetes master".
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Prior to this we re-used the CLI testing client to test the authorize flow of the supervisor, but they really need to be separate upstream clients. For example, the supervisor client should be a non-public client with a client secret and a different callback endpoint.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Before, we did this in an init container, which meant if the Dex pod restarted we would have fresh certs, but our Tilt/bash setup didn't account for this.
Now, the certs are generated by a Job which runs once and saves the generated files into a Secret. This should be a bit more stable.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change deploys a small Squid-based proxy into the `dex` namespace in our integration test environment. This lets us use the cluster-local DNS name (`http://dex.dex.svc.cluster.local/dex`) as the OIDC issuer. It will make generating certificates easier, and most importantly it will mean that our CLI can see Dex at the same name/URL as the supervisor running inside the cluster.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This check predates the API renaming we did. Now that our API groups have `concierge`/`supervisor` in the name, we don't need to maintain a specific set of `cp` commands and keep them in sync, so we don't really need this check.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These only really make sense for aggregated API types where we need `conversion-gen` to do version conversion.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I tried to follow a principle of encapsulation here - we can still default to
peeps making connections to 80/443 on a Service object, but internally we will
use 8080/8443.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This only matters for local development, since we don't use this script directly in CI. Makes the full codegen ste take ~90s on my laptop.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is the first of a few related changes that re-organize our API after the big recent changes that introduced the supervisor component.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- TLS certificates can be configured on the OIDCProviderConfig using
the `secretName` field.
- When listening for incoming TLS connections, choose the TLS cert
based on the SNI hostname of the incoming request.
- Because SNI hostname information on incoming requests does not include
the port number of the request, we add a validation that
OIDCProviderConfigs where the issuer hostnames (not including port
number) are the same must use the same `secretName`.
- Note that this approach does not yet support requests made to an
IP address instead of a hostname. Also note that `localhost` is
considered a hostname by SNI.
- Add port 443 as a container port to the pod spec.
- A new controller watches for TLS secrets and caches them in memory.
That same in-memory cache is used while servicing incoming connections
on the TLS port.
- Make it easy to configure both port 443 and/or port 80 for various
Service types using our ytt templates for the supervisor.
- When deploying to kind, add another nodeport and forward it to the
host on another port to expose our new HTTPS supervisor port to the
host.
When using kind we forward the node's port to the host, so we only
really care about the `nodePort` value. For acceptance clusters,
we put an Ingress in front of a NodePort Service, so we only really
care about the `port` value.
- When using `local()` in the Tiltfile it will not know
to watch those files for changes, so each time we use
`local()` we now also use `watch_file()`
- As a result, editing a ytt template file now causes
an immediate `kubectl apply` of the results
- Tiltfile and prepare-for-integration-tests.sh both specify the
NodePort Service using `--data-value-yaml 'service_nodeport_port=31234'`
- Also rename the namespaces used by the Concierge and Supervisor apps
during integration tests running locally
- Also continue renaming things related to the concierge app
- Enhance the uninstall test to also test uninstalling the supervisor
and local-user-authenticator apps
- Variables specific to concierge add it to their name
- All variables now start with `PINNIPED_TEST_` which makes it clear
that they are for tests and also helps them not conflict with the
env vars that are used in the Pinniped CLI code
- Intended to be a red test in this commit; will make it go
green in a future commit
- Enhance env.go and prepare-for-integration-tests.sh to make it
possible to write integration tests for the supervisor app
by setting more env vars and by exposing the service to the kind
host on a localhost port
- Add `--clean` option to prepare-for-integration-tests.sh
to make it easier to start fresh
- Make prepare-for-integration-tests.sh advise you to run
`go test -v -count 1 ./test/integration` because this does
not buffer the test output
- Make concierge_api_discovery_test.go pass by adding expectations
for the new OIDCProviderConfig type
- Prevent the server binary from lying about its version number
by having it report "?.?.?" as its version number for now.
- Later we can devise a way for CI to inject the version number
for the server into the container image at release time,
not at build time, since the version number is not known
at build time.
- Pre-release builds of the binary from before the release stage or
builds on developer workstation will also report "?.?.?" as its
version number, which is fine since they are not official releases
and shouldn't find their way to the public.
New resource naming conventions:
- Do not repeat the Kind in the name,
e.g. do not call it foo-cluster-role-binding, just call it foo
- Names will generally start with a prefix to identify our component,
so when a user lists all objects of that kind, they can tell to which
component it is related,
e.g. `kubectl get configmaps` would list one named "pinniped-config"
- It should be possible for an operator to make the word "pinniped"
mostly disappear if they choose, by specifying the app_name in
values.yaml, to the extent that is practical (but not from APIService
names because those are hardcoded in golang)
- Each role/clusterrole and its corresponding binding have the same name
- Pinniped resource names that must be known by the server golang code
are passed to the code at run time via ConfigMap, rather than
hardcoded in the golang code. This also allows them to be prepended
with the app_name from values.yaml while creating the ConfigMap.
- Since the CLI `get-kubeconfig` command cannot guess the name of the
CredentialIssuerConfig resource in advance anymore, it lists all
CredentialIssuerConfig in the app's namespace and returns an error
if there is not exactly one found, and then uses that one regardless
of its name