Commit Graph

184 Commits

Author SHA1 Message Date
Andrew Keesler
2e784e006c
Merge remote-tracking branch 'upstream/main' into secret-generation 2020-12-15 13:24:33 -05:00
Andrew Keesler
50f9b434e7
SameIssuerHostMustUseSameSecret is a valid OIDCProvider status
I saw this message in our CI logs, which led me to this fix.
  could not update status: OIDCProvider.config.supervisor.pinniped.dev "acceptance-provider" is invalid: status.status: Unsupported value: "SameIssuerHostMustUseSameSecret": supported values: "Success", "Duplicate", "Invalid"

Also - correct an integration test error message that was misleading.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-12-15 11:53:53 -05:00
Andrew Keesler
82ae98d9d0
Set secret names on OIDCProvider status field
We believe this API is more forwards compatible with future secrets management
use cases. The implementation is a cry for help, but I was trying to follow the
previously established pattern of encapsulating the secret generation
functionality to a single group of packages.

This commit makes a breaking change to the current OIDCProvider API, but that
OIDCProvider API was added after the latest release, so it is technically still
in development until we release, and therefore we can continue to thrash on it.

I also took this opportunity to make some things private that didn't need to be
public.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-12-15 09:13:01 -05:00
Andrew Keesler
e17bc31b29
Pass CSRF cookie signing key from controller to cache
This also sets the CSRF cookie Secret's OwnerReference to the Pod's grandparent
Deployment so that when the Deployment is cleaned up, then the Secret is as
well.

Obviously this controller implementation has a lot of issues, but it will at
least get us started.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-12-11 11:49:27 -05:00
Andrew Keesler
946b0539d2
Add JWTAuthenticator API type
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-12-08 15:41:48 -05:00
Rajat Goyal
31810a97e1 Remove duplicate docs from the repo and change all links to point to the Hugo site 2020-12-02 23:58:19 +05:30
Mo Khan
c32e452db8
Add nonroot SCC to work on OpenShift clusters 2020-11-18 17:08:45 -05:00
Matt Moyer
e867fb82b9
Add spec.tls field to UpstreamOIDCProvider API.
This allows for a custom CA bundle to be used when connecting to the upstream issuer.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-16 20:23:20 -06:00
Matt Moyer
d3d8ef44a0
Make more fields in UpstreamOIDCProvider optional.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-13 15:28:37 -06:00
Matt Moyer
2e7d869ccc
Add generated API/client code for new UpstreamOIDCProvider CRD.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-13 11:38:50 -06:00
Matt Moyer
bac3c19bec
Add UpstreamOIDCProvider API type definition.
This is essentially just a copy of Andrew's work from https://github.com/vmware-tanzu/pinniped/pull/135.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-13 11:38:49 -06:00
Matt Moyer
7f2c43cd62
Put all of our APIs into a "pinniped" category, and never use "all".
We want to have our APIs respond to `kubectl get pinniped`, and we shouldn't use `all` because we don't think most average users should have permission to see our API types, which means if we put our types there, they would get an error from `kubectl get all`.

I also added some tests to assert these properties on all `*.pinniped.dev` API resources.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-12 16:26:34 -06:00
Andrew Keesler
724c0d3eb0
Add YTT template value for setting log level
This is helpful for us, amongst other users, because we want to enable "debug"
logging whenever we deploy components for testing.

See a5643e3 for addition of log level.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-11-11 09:01:38 -05:00
Ryan Richard
1223cf7877
Merge pull request #154 from vmware-tanzu/change_release_static_yaml_names
Rename static yaml files in release process
2020-11-02 17:09:11 -08:00
Matt Moyer
c451604816
Merge pull request #182 from mattmoyer/more-renames
Rename more APIs before we cut a release with longer-term API compatibility
2020-11-02 18:34:26 -06:00
Ryan Richard
05cf56a0fa
Merge pull request #180 from vmware-tanzu/limits
Add CPU/memory limits to our deployments
2020-11-02 16:22:37 -08:00
Matt Moyer
2bf5c8b48b
Replace the OIDCProvider field SNICertificateSecretName with a TLS.SecretName field.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-02 18:15:03 -06:00
Ryan Richard
05233963fb Add CPU requests and limits to the Concierge and Supervisor deployments 2020-11-02 15:47:20 -08:00
Matt Moyer
2b8773aa54
Rename OIDCProviderConfig to OIDCProvider.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-02 17:40:39 -06:00
Matt Moyer
59263ea733
Rename CredentialIssuerConfig to CredentialIssuer.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-02 17:39:42 -06:00
Matt Moyer
b13a8075e4
Merge pull request #183 from vmware-tanzu/non-root
Run as non-root
2020-11-02 17:39:14 -06:00
Matt Moyer
935577f8e7
Give the concierge access to use any PodSecurityPolicy.
This is needed on clusters with PodSecurityPolicy enabled by default, but should be harmless in other cases.

This is generally needed because a restrictive PodSecurityPolicy will usually otherwise prevent the `hostPath` volume mount needed by the dynamically-created cert agent pod.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-02 15:10:00 -06:00
Ryan Richard
781f86d18c
deploy: add memory limits
This is the beginning of a change to add cpu/memory limits to our pods.
We are doing this because some consumers require this, and it is generally
a good practice.

The limits == requests for "Guaranteed" QoS.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-11-02 14:57:39 -05:00
Andrew Keesler
fcea48c8f9
Run as non-root
I tried to follow a principle of encapsulation here - we can still default to
peeps making connections to 80/443 on a Service object, but internally we will
use 8080/8443.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-11-02 12:51:15 -05:00
Matt Moyer
9e1922f1ed
Split the config CRDs into two API groups.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-30 19:22:46 -05:00
Ryan Richard
01f4fdb5c3 Remove namespace from a ClusterRoleBinding, which are not namespaced 2020-10-30 16:10:04 -07:00
Matt Moyer
0f25657a35
Rename WebhookIdentityProvider to WebhookAuthenticator.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-30 15:11:53 -05:00
Matt Moyer
e69183aa8a
Rename idp.concierge.pinniped.dev to authentication.concierge.pinniped.dev.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-30 14:07:40 -05:00
Matt Moyer
81390bba89
Rename idp.pinniped.dev to idp.concierge.pinniped.dev.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-30 14:07:39 -05:00
Matt Moyer
f0320dfbd8
Rename login API to login.concierge.pinniped.dev.
This is the first of a few related changes that re-organize our API after the big recent changes that introduced the supervisor component.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-30 09:58:28 -05:00
Ryan Richard
059b6e885f Allow ytt templating of the loadBalancerIP for the supervisor 2020-10-28 16:45:23 -07:00
Ryan Richard
01dddd3cae Add some docs for configuring supervisor TLS
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-28 13:42:02 -07:00
Ryan Richard
29e0ce5662 Configure name of the supervisor default TLS cert secret via ConfigMap
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-28 11:56:50 -07:00
Ryan Richard
eeb110761e Rename secretName to SNICertificateSecretName in OIDCProviderConfig 2020-10-26 17:25:45 -07:00
Ryan Richard
8b7c30cfbd Supervisor listens for HTTPS on port 443 with configurable TLS certs
- TLS certificates can be configured on the OIDCProviderConfig using
  the `secretName` field.
- When listening for incoming TLS connections, choose the TLS cert
  based on the SNI hostname of the incoming request.
- Because SNI hostname information on incoming requests does not include
  the port number of the request, we add a validation that
  OIDCProviderConfigs where the issuer hostnames (not including port
  number) are the same must use the same `secretName`.
- Note that this approach does not yet support requests made to an
  IP address instead of a hostname. Also note that `localhost` is
  considered a hostname by SNI.
- Add port 443 as a container port to the pod spec.
- A new controller watches for TLS secrets and caches them in memory.
  That same in-memory cache is used while servicing incoming connections
  on the TLS port.
- Make it easy to configure both port 443 and/or port 80 for various
  Service types using our ytt templates for the supervisor.
- When deploying to kind, add another nodeport and forward it to the
  host on another port to expose our new HTTPS supervisor port to the
  host.
2020-10-26 17:03:26 -07:00
Ryan Richard
25a91019c2 Add spec.secretName to OPC and handle case-insensitive hostnames
- When two different Issuers have the same host (i.e. they differ
  only by path) then they must have the same secretName. This is because
  it wouldn't make sense for there to be two different TLS certificates
  for one host. Find any that do not have the same secret name to
  put an error status on them and to avoid serving OIDC endpoints for
  them. The host comparison is case-insensitive.
- Issuer hostnames should be treated as case-insensitive, because
  DNS hostnames are case-insensitive. So https://me.com and
  https://mE.cOm are duplicate issuers. However, paths are
  case-sensitive, so https://me.com/A and https://me.com/a are
  different issuers. Fixed this in the issuer validations and in the
  OIDC Manager's request router logic.
2020-10-23 16:25:44 -07:00
Andrew Keesler
f928ef4752 Also mention using a service mesh is an option for supervisor ingress
Signed-off-by: Ryan Richard <richardry@vmware.com>
2020-10-23 10:23:17 -07:00
Ryan Richard
eafdef7b11 Add docs for creating an Ingress for the Supervisor
Note that some of these new docs mention things that will not be
implemented until we finish the next story.
2020-10-22 16:57:50 -07:00
Ryan Richard
397ec61e57 Specify the supervisor NodePort Service's port and nodePort separately
When using kind we forward the node's port to the host, so we only
really care about the `nodePort` value. For acceptance clusters,
we put an Ingress in front of a NodePort Service, so we only really
care about the `port` value.
2020-10-22 15:37:35 -07:00
Ryan Richard
122f7cffdb Make the supervisor healthz endpoint public
Based on our experiences today with GKE, it will be easier for our users
to configure Ingress health checks if the healthz endpoint is available
on the same public port as the OIDC endpoints.

Also add an integration test for the healthz endpoint now that it is
public.

Also add the optional `containers[].ports.containerPort` to the
supervisor Deployment because the GKE docs say that GKE will look
at that field while inferring how to invoke the health endpoint. See
https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#def_inf_hc
2020-10-21 15:24:58 -07:00
Andrew Keesler
fa5f653de6 Implement readinessProbe and livenessProbe for supervisor
Signed-off-by: Ryan Richard <richardry@vmware.com>
2020-10-21 11:51:31 -07:00
Andrew Keesler
617c5608ca Supervisor controllers apply custom labels to JWKS secrets
Signed-off-by: Ryan Richard <richardry@vmware.com>
2020-10-15 12:40:56 -07:00
Ryan Richard
f8e461dfc3 Merge branch 'main' into label_every_resource 2020-10-15 10:19:03 -07:00
Ryan Richard
94f20e57b1 Concierge controllers add labels to all created resources 2020-10-15 10:14:23 -07:00
Ryan Richard
1301018655 Support installing concierge and supervisor into existing namespace
- New optional ytt value called `into_namespace` means install into that
  preexisting namespace rather than creating a new namespace for each app
- Also ensure that every resource that is created statically by our yaml
  at install-time by either app is labeled consistently
- Also support adding custom labels to all of those resources from a
  new ytt value called `custom_labels`
2020-10-14 15:05:42 -07:00
Andrew Keesler
6aed025c79
supervisor-generate-key: initial spike
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-14 09:47:34 -04:00
Andrew Keesler
3d5937a8e8
deploy/supervisor: type: eaxmple -> example
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-14 09:22:15 -04:00
Ryan Richard
478b0a0fd8 Add supervisor yaml and rename concierge yaml in release process
Add install-pinniped-supervisor.yaml and rename install-pinniped.yaml
to install-pinniped-concierge.yaml in the release process and
installation/demo documentation.
2020-10-12 09:43:52 -07:00
Ryan Richard
171f3ed906 Add some docs for how to configure the Supervisor app after installing 2020-10-09 16:28:34 -07:00
Ryan Richard
354b922e48 Allow creation of different Service types in Supervisor ytt templates
- Tiltfile and prepare-for-integration-tests.sh both specify the
  NodePort Service using `--data-value-yaml 'service_nodeport_port=31234'`
- Also rename the namespaces used by the Concierge and Supervisor apps
  during integration tests running locally
2020-10-09 16:00:11 -07:00
Ryan Richard
34549b779b Make tilt work with the supervisor app and add more uninstall testing
- Also continue renaming things related to the concierge app
- Enhance the uninstall test to also test uninstalling the supervisor
  and local-user-authenticator apps
2020-10-09 14:25:34 -07:00
Ryan Richard
b71959961d Merge branch 'main' into supervisor-with-discovery 2020-10-09 10:00:50 -07:00
Ryan Richard
f5a6a0bb1e Move all three deployment dirs under a new top-level deploy/ dir 2020-10-09 10:00:22 -07:00
Andrew Keesler
c555c14ccb
supervisor-oidc: add OIDCProviderConfig.Status.LastUpdateTime
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-09 11:54:50 -04:00
Ryan Richard
a4389562e3 Fix mistake in deployment.yaml where service selector was hardcoded 2020-10-08 16:20:21 -07:00
Andrew Keesler
da00fc708f
supervisor-oidc: checkpoint: add status to provider CRD
Signed-off-by: Ryan Richard <richardry@vmware.com>
2020-10-08 13:27:45 -04:00
Andrew Keesler
20ce142f90
Merge remote-tracking branch 'upstream/main' into supervisor-with-discovery 2020-10-07 11:37:33 -04:00
Ryan Richard
14f1d86833
supervisor-oidc: add OIDCProviderConfig CRD
This will hopefully come in handy later if we ever decide to add
support for multiple OIDC providers as a part of one supervisor.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-06 15:20:29 -04:00
Matt Moyer
c1c75a8f22
Add kube_cert_agent_image value to main ytt template.
This needs to be overridden for Tilt usage, since the main image referenced by Tilt isn't actually pullable.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-05 15:18:38 -05:00
Ryan Richard
82f8094de7 Update documentation to use the deployment YAML files from the releases 2020-09-24 17:56:21 -07:00
Andrew Keesler
d853cbc7ff
Plumb through ImagePullSecrets to agent pod
Right now in the YTT templates we assume that the agent pods are gonna use
the same image as the main Pinniped deployment, so we can use the same logic
for the image pull secrets.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-24 15:52:05 -04:00
Andrew Keesler
0f8437bc3a
Integration tests are passing ayooooooooooooooo 2020-09-23 12:47:04 -04:00
Andrew Keesler
db9a97721f
Merge remote-tracking branch 'upstream/main' into 1-19-exec-strategy 2020-09-22 11:54:47 -04:00
Andrew Keesler
1a4f9e3466
kubecertagent: get integration tests passing again
Note: the non-kubecertagent integration tests are still failing :).
2020-09-22 11:38:13 -04:00
Matt Moyer
9beb3855b5
Create webhooks per-test and explicitly in demo.md instead of with ytt in ./deploy.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-22 10:03:32 -05:00
Andrew Keesler
e18b6fdddc
deploy: add kube-cert-agent deployment knobs 2020-09-21 14:16:32 -04:00
Ryan Richard
6989e5da63 Merge branch 'main' into rename_stuff 2020-09-18 16:39:58 -07:00
Ryan Richard
80a520390b Rename many of resources that are created in Kubernetes by Pinniped
New resource naming conventions:
- Do not repeat the Kind in the name,
  e.g. do not call it foo-cluster-role-binding, just call it foo
- Names will generally start with a prefix to identify our component,
  so when a user lists all objects of that kind, they can tell to which
  component it is related,
  e.g. `kubectl get configmaps` would list one named "pinniped-config"
- It should be possible for an operator to make the word "pinniped"
  mostly disappear if they choose, by specifying the app_name in
  values.yaml, to the extent that is practical (but not from APIService
  names because those are hardcoded in golang)
- Each role/clusterrole and its corresponding binding have the same name
- Pinniped resource names that must be known by the server golang code
  are passed to the code at run time via ConfigMap, rather than
  hardcoded in the golang code. This also allows them to be prepended
  with the app_name from values.yaml while creating the ConfigMap.
- Since the CLI `get-kubeconfig` command cannot guess the name of the
  CredentialIssuerConfig resource in advance anymore, it lists all
  CredentialIssuerConfig in the app's namespace and returns an error
  if there is not exactly one found, and then uses that one regardless
  of its name
2020-09-18 15:56:50 -07:00
Matt Moyer
78ac27c262
Remove deprecated "pinniped.dev" API group.
This has been replaced by the "login.pinniped.dev" group with a slightly different API.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-18 17:32:15 -05:00
Matt Moyer
907ccb68f5
Move CredentialIssuerConfig into new "config.pinniped.dev" API group.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-18 16:38:45 -05:00
Ryan Richard
eabe51c446 local-user-authenticator can be deployed from a private registry image
- Also add more comment to the values.yaml files to make the options
  more clear
2020-09-17 16:07:31 -07:00
Matt Moyer
10793ac11f
Allow anonymous access to TokenCredentialRequests.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-17 09:52:23 -05:00
Matt Moyer
7ce760a5dd
Register a second APIService for the login.pinniped.dev.
This is handled by a second instance of the APIServiceUpdaterController.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-17 09:52:23 -05:00
Andrew Keesler
6c75de9334 Use public container images for codegen as as defaults when deploying
Signed-off-by: Ryan Richard <richardry@vmware.com>
2020-09-16 15:46:51 -07:00
Andrew Keesler
e7b389ae6c
Update copyright to reference Pinniped contributors
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-16 10:05:51 -04:00
Ryan Richard
db98f2810f
Merge pull request #98 from suzerain-io/get_kubeconfig_cli
Organize Pinniped CLI into subcommands; Add get-kubeconfig subcommand
2020-09-15 13:34:14 -07:00
Ryan Richard
cecd691a84 Add demo instructions
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-15 12:10:20 -07:00
Matt Moyer
2bdbac3e15
Move the ytt webhook config out into the CRD.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-15 12:02:33 -05:00
Matt Moyer
5b9f2ec9fc
Give our controller access to all our CRD types.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-15 12:02:33 -05:00
Matt Moyer
557fd0df26
Define the WebhookIdentityProvider CRD.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-15 11:44:23 -05:00
Ryan Richard
b7bdb7f3b1 Rename test-webhook to local-user-authenticator
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-10 15:20:02 -07:00
Andrew Keesler
89d01b84f8
deploy/README.md: fix markdown link to test webhook README.md
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-10 09:33:46 -04:00
Ryan Richard
2565f67824 Create a deployment for test-webhook
- For now, build the test-webhook binary in the same container image as
  the pinniped-server binary, to make it easier to distribute
- Also fix lots of bugs from the first draft of the test-webhook's
  `/authenticate` implementation from the previous commit
- Add a detailed README for the new deploy-test-webhook directory
2020-09-09 19:06:39 -07:00
Matt Moyer
b0315e5e9f Add a ytt template value for replica count.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-09-08 11:36:32 -05:00
Andrew Keesler
f8f16fadb9
Merge pull request #69 from ankeesler/pod-anti-affinity
Add pod anti-affinity to make our HA deployment more HA
2020-09-08 11:01:55 -04:00
Andrew Keesler
1415fcc6dc
Add pod anti-affinity to make our HA deployment more HA
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-08 10:08:34 -04:00
Matt Moyer
2959b54e7b Generate CRD YAML using controller-tools, update doc strings.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-08-31 16:38:48 -05:00
Ryan Richard
f0c400235a
Add memory request to pinniped deployment
- We are not setting an upper limit because Kubernetes might randomly
  decide to unschedule our pod in ways that we can't anticipate in
  advance, causing very hard to reproduce production bugs.
- We noticed that our app currently uses ~30 MB of memory when idle,
  and ~35 MB of memory under some load. So a memory request of 128
  MB should be reasonable.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-08-28 15:19:16 -04:00
Ryan Richard
1375df185d Doc updates 2020-08-27 10:14:03 -07:00
Andrew Keesler
7502190135
Fix some copy issues in the docs 2020-08-27 08:39:57 -04:00
Ryan Richard
6e59596285 Upon pod startup, update the Status of CredentialIssuerConfig
- Indicate the success or failure of the cluster signing key strategy
- Also introduce the concept of "capabilities" of an integration test
  cluster to allow the integration tests to be run against clusters
  that do or don't allow the borrowing of the cluster signing key
- Tests that are not expected to pass on clusters that lack the
  borrowing of the signing key capability are now ignored by
  calling the new library.SkipUnlessClusterHasCapability test helper
- Rename library.Getenv to library.GetEnv
- Add copyrights where they were missing
2020-08-24 18:07:34 -07:00
Ryan Richard
6d43d7ba19 Update the schema of CredentialIssuerConfig
- Move the current info from spec to status
- Add schema for new stuff that we will use in a future commit to status
- Regenerate the generated code
2020-08-21 17:00:42 -07:00
Ryan Richard
ace01c86de Rename PinnipedDiscoveryInfo to CredentialIssuerConfig
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-08-21 16:16:34 -07:00
Andrew Keesler
2b297c28d5
Get rid of TODO that was completed in ecde8fa8
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-08-21 10:38:28 -04:00
Ryan Richard
88f3b41e71
deploy: add API cert config map values
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-08-20 17:14:16 -04:00
Ryan Richard
3929fa672e Rename project 2020-08-20 10:54:15 -07:00
Andrew Keesler
6b90dc8bb7
Auto-rotate serving certificate
The rotation is forced by a new controller that deletes the serving cert
secret, as other controllers will see this deletion and ensure that a new
serving cert is created.

Note that the integration tests now have an addition worst case runtime of
60 seconds. This is because of the way that the aggregated API server code
reloads certificates. We will fix this in a future story. Then, the
integration tests should hopefully get much faster.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-08-20 10:03:36 -04:00
Matt Moyer
1b9a70d089
Switch back to an exec-based approach to grab the controller-manager CA. (#65)
This switches us back to an approach where we use the Pod "exec" API to grab the keys we need, rather than forcing our code to run on the control plane node. It will help us fail gracefully (or dynamically switch to alternate implementations) when the cluster is not self-hosted.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
2020-08-19 13:21:07 -05:00
Ryan Richard
003aef75d2 For liveness and readiness, succeed quickly and fail slowly
- No reason to wait a long time before the first check, since our
  app should start quickly
2020-08-18 09:18:51 -07:00
Ryan Richard
ecde8fa8af Implement basic liveness and readiness probes
- Call the auto-generated /healthz endpoint of our aggregated API server
- Use http for liveness even though tcp seems like it might be
  more appropriate, because tcp probes cause TLS handshake errors
  to appear in our logs every few seconds
- Use conservative timeouts and retries on the liveness probe to avoid
  having our container get restarted when it is temporarily slow due
  to running in an environment under resource pressure
- Use less conservative timeouts and retries for the readiness probe
  to remove an unhealthy pod from the service less conservatively than
  restarting the container
- Tuning the settings for retries and timeouts seem to be a mysterious
  art, so these are just a first draft
2020-08-17 16:44:42 -07:00