Commit Graph

21 Commits

Author SHA1 Message Date
Matt Moyer
f0a1555aca
Fix broken "read only" fields added in v0.11.0.
These fields were changed as a minor hardening attempt when we switched to Distroless, but I bungled the field names and we never noticed because Kapp doesn't apply API validations.

This change fixes the field names so they act as was originally intended. We should also follow up with a change that validates all of our installation manifest in CI.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-09-02 16:12:39 -05:00
Monis Khan
66ddcf98d3
Provide good defaults for NO_PROXY
This change updates the default NO_PROXY for the supervisor to not
proxy requests to the Kubernetes API and other Kubernetes endpoints
such as Kubernetes services.

It also adds https_proxy and no_proxy settings for the concierge
with the same default.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-17 10:03:19 -04:00
Matt Moyer
58bbffded4
Switch to a slimmer distroless base image.
At a high level, it switches us to a distroless base container image, but that also includes several related bits:

- Add a writable /tmp but make the rest of our filesystems read-only at runtime.

- Condense our main server binaries into a single pinniped-server binary. This saves a bunch of space in
  the image due to duplicated library code. The correct behavior is dispatched based on `os.Args[0]`, and
  the `pinniped-server` binary is symlinked to `pinniped-concierge` and `pinniped-supervisor`.

- Strip debug symbols from our binaries. These aren't really useful in a distroless image anyway and all the
  normal stuff you'd expect to work, such as stack traces, still does.

- Add a separate `pinniped-concierge-kube-cert-agent` binary with "sleep" and "print" functionality instead of
  using builtin /bin/sleep and /bin/cat for the kube-cert-agent. This is split from the main server binary
  because the loading/init time of the main server binary was too large for the tiny resource footprint we
  established in our kube-cert-agent PodSpec. Using a separate binary eliminates this issue and the extra
  binary adds only around 1.5MiB of image size.

- Switch the kube-cert-agent code to use a JSON `{"tls.crt": "<b64 cert>", "tls.key": "<b64 key>"}` format.
  This is more robust to unexpected input formatting than the old code, which simply concatenated the files
  with some extra newlines and split on whitespace.

- Update integration tests that made now-invalid assumptions about the `pinniped-server` image.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-08-09 15:05:13 -04:00
Ryan Richard
f1e63c55d4 Add https_proxy and no_proxy settings for the Supervisor
- Add new optional ytt params for the Supervisor deployment.
- When the Supervisor is making calls to an upstream OIDC provider,
  use these variables if they were provided.
- These settings are integration tested in the main CI pipeline by
  sometimes setting them on deployments in certain cases, and then
  letting the existing integration tests (e.g. TestE2EFullIntegration)
  provide the coverage, so there are no explicit changes to the
  integration tests themselves in this commit.
2021-07-07 12:50:13 -07:00
Ryan Richard
616211c1bc
deploy: wire API group suffix through YTT templates
I didn't advertise this feature in the deploy README's since (hopefully) not
many people will want to use it?

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2021-01-19 17:23:06 -05:00
Andrew Keesler
af11d8cd58
Run Tilt images as root for faster reload
Previously, when triggering a Tilt reload via a *.go file change, a reload would
take ~13 seconds and we would see this error message in the Tilt logs for each
component.

  Live Update failed with unexpected error:
    command terminated with exit code 2
  Falling back to a full image build + deploy

Now, Tilt should reload images a lot faster (~3 seconds) since we are running
the images as root.

Note! Reloading the Concierge component still takes ~13 seconds because there
are 2 containers running in the Concierge namespace that use the Concierge
image: the main Concierge app and the kube cert agent pod. Tilt can't live
reload both of these at once, so the reload takes longer and we see this error
message.

  Will not perform Live Update because:
    Error retrieving container info: can only get container info for a single pod; image target image:image/concierge has 2 pods
  Falling back to a full image build + deploy

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2021-01-15 11:34:53 -05:00
Andrew Keesler
e17bc31b29
Pass CSRF cookie signing key from controller to cache
This also sets the CSRF cookie Secret's OwnerReference to the Pod's grandparent
Deployment so that when the Deployment is cleaned up, then the Secret is as
well.

Obviously this controller implementation has a lot of issues, but it will at
least get us started.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-12-11 11:49:27 -05:00
Andrew Keesler
724c0d3eb0
Add YTT template value for setting log level
This is helpful for us, amongst other users, because we want to enable "debug"
logging whenever we deploy components for testing.

See a5643e3 for addition of log level.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-11-11 09:01:38 -05:00
Ryan Richard
05cf56a0fa
Merge pull request #180 from vmware-tanzu/limits
Add CPU/memory limits to our deployments
2020-11-02 16:22:37 -08:00
Ryan Richard
05233963fb Add CPU requests and limits to the Concierge and Supervisor deployments 2020-11-02 15:47:20 -08:00
Ryan Richard
781f86d18c
deploy: add memory limits
This is the beginning of a change to add cpu/memory limits to our pods.
We are doing this because some consumers require this, and it is generally
a good practice.

The limits == requests for "Guaranteed" QoS.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-11-02 14:57:39 -05:00
Andrew Keesler
fcea48c8f9
Run as non-root
I tried to follow a principle of encapsulation here - we can still default to
peeps making connections to 80/443 on a Service object, but internally we will
use 8080/8443.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-11-02 12:51:15 -05:00
Ryan Richard
29e0ce5662 Configure name of the supervisor default TLS cert secret via ConfigMap
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-28 11:56:50 -07:00
Ryan Richard
8b7c30cfbd Supervisor listens for HTTPS on port 443 with configurable TLS certs
- TLS certificates can be configured on the OIDCProviderConfig using
  the `secretName` field.
- When listening for incoming TLS connections, choose the TLS cert
  based on the SNI hostname of the incoming request.
- Because SNI hostname information on incoming requests does not include
  the port number of the request, we add a validation that
  OIDCProviderConfigs where the issuer hostnames (not including port
  number) are the same must use the same `secretName`.
- Note that this approach does not yet support requests made to an
  IP address instead of a hostname. Also note that `localhost` is
  considered a hostname by SNI.
- Add port 443 as a container port to the pod spec.
- A new controller watches for TLS secrets and caches them in memory.
  That same in-memory cache is used while servicing incoming connections
  on the TLS port.
- Make it easy to configure both port 443 and/or port 80 for various
  Service types using our ytt templates for the supervisor.
- When deploying to kind, add another nodeport and forward it to the
  host on another port to expose our new HTTPS supervisor port to the
  host.
2020-10-26 17:03:26 -07:00
Ryan Richard
122f7cffdb Make the supervisor healthz endpoint public
Based on our experiences today with GKE, it will be easier for our users
to configure Ingress health checks if the healthz endpoint is available
on the same public port as the OIDC endpoints.

Also add an integration test for the healthz endpoint now that it is
public.

Also add the optional `containers[].ports.containerPort` to the
supervisor Deployment because the GKE docs say that GKE will look
at that field while inferring how to invoke the health endpoint. See
https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#def_inf_hc
2020-10-21 15:24:58 -07:00
Andrew Keesler
fa5f653de6 Implement readinessProbe and livenessProbe for supervisor
Signed-off-by: Ryan Richard <richardry@vmware.com>
2020-10-21 11:51:31 -07:00
Andrew Keesler
617c5608ca Supervisor controllers apply custom labels to JWKS secrets
Signed-off-by: Ryan Richard <richardry@vmware.com>
2020-10-15 12:40:56 -07:00
Ryan Richard
94f20e57b1 Concierge controllers add labels to all created resources 2020-10-15 10:14:23 -07:00
Ryan Richard
1301018655 Support installing concierge and supervisor into existing namespace
- New optional ytt value called `into_namespace` means install into that
  preexisting namespace rather than creating a new namespace for each app
- Also ensure that every resource that is created statically by our yaml
  at install-time by either app is labeled consistently
- Also support adding custom labels to all of those resources from a
  new ytt value called `custom_labels`
2020-10-14 15:05:42 -07:00
Ryan Richard
354b922e48 Allow creation of different Service types in Supervisor ytt templates
- Tiltfile and prepare-for-integration-tests.sh both specify the
  NodePort Service using `--data-value-yaml 'service_nodeport_port=31234'`
- Also rename the namespaces used by the Concierge and Supervisor apps
  during integration tests running locally
2020-10-09 16:00:11 -07:00
Ryan Richard
f5a6a0bb1e Move all three deployment dirs under a new top-level deploy/ dir 2020-10-09 10:00:22 -07:00