Commit Graph

63 Commits

Author SHA1 Message Date
Ryan Richard
743cb2d250 kube cert agent pod requests 0 cpu to avoid scheduling failures 2023-07-25 10:09:30 -07:00
Ryan Richard
7ff3b3d9cb Code changes to support Kube 0.26 deps 2023-01-18 14:39:22 -08:00
Ryan Richard
2f9b8b105d update copyright to 2023 in files changed by this PR 2023-01-17 15:54:16 -08:00
Ryan Richard
976035115e Stop using pointer pkg functions that were deprecated by dependency bump 2022-12-14 08:47:16 -08:00
Ryan Richard
c6c2c525a6 Upgrade the linter and fix all new linter warnings
Also fix some tests that were broken by bumping golang and dependencies
in the previous commits.

Note that in addition to changes made to satisfy the linter which do not
impact the behavior of the code, this commit also adds ReadHeaderTimeout
to all usages of http.Server to satisfy the linter (and because it
seemed like a good suggestion).
2022-08-24 14:45:55 -07:00
Ryan Richard
7751c0bf59
Bump project deps, including kube 0.23.6->0.24.1 and Go 1.18.1->1.18.3
Several API changes in Kube required changes in Pinniped code.

Signed-off-by: Monis Khan <mok@vmware.com>
2022-06-07 15:26:30 -04:00
Monis Khan
0674215ef3
Switch to go.uber.org/zap for JSON formatted logging
Signed-off-by: Monis Khan <mok@vmware.com>
2022-05-24 11:17:42 -04:00
Margo Crawford
53597bb824 Introduce FIPS compatibility
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-03-29 16:58:41 -07:00
Ryan Richard
814399324f Merge branch 'main' into upstream_access_revocation_during_gc 2022-01-14 10:49:22 -08:00
Ryan Richard
651d392b00 Refuse logins when no upstream refresh token and no userinfo endpoint
Signed-off-by: Margo Crawford <margaretc@vmware.com>
2022-01-12 18:03:25 -08:00
Monis Khan
9599ffcfb9
Update all deps to latest where possible, bump Kube deps to v0.23.1
Highlights from this dep bump:

1. Made a copy of the v0.4.0 github.com/go-logr/stdr implementation
   for use in tests.  We must bump this dep as Kube code uses a
   newer version now.  We would have to rewrite hundreds of test log
   assertions without this copy.
2. Use github.com/felixge/httpsnoop to undo the changes made by
   ory/fosite#636 for CLI based login flows.  This is required for
   backwards compatibility with older versions of our CLI.  A
   separate change after this will update the CLI to be more
   flexible (it is purposefully not part of this change to confirm
   that we did not break anything).  For all browser login flows, we
   now redirect using http.StatusSeeOther instead of http.StatusFound.
3. Drop plog.RemoveKlogGlobalFlags as klog no longer mutates global
   process flags
4. Only bump github.com/ory/x to v0.0.297 instead of the latest
   v0.0.321 because v0.0.298+ pulls in a newer version of
   go.opentelemetry.io/otel/semconv which breaks k8s.io/apiserver.
   We should update k8s.io/apiserver to use the newer code.
5. Migrate all code from k8s.io/apimachinery/pkg/util/clock to
   k8s.io/utils/clock and k8s.io/utils/clock/testing
6. Delete testutil.NewDeleteOptionsRecorder and migrate to the new
   kubetesting.NewDeleteActionWithOptions
7. Updated ExpectedAuthorizeCodeSessionJSONFromFuzzing caused by
   fosite's new rotated_secrets OAuth client field.  This new field
   is currently not relevant to us as we have no private clients.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-12-16 21:15:27 -05:00
Monis Khan
cd686ffdf3
Force the use of secure TLS config
This change updates the TLS config used by all pinniped components.
There are no configuration knobs associated with this change.  Thus
this change tightens our static defaults.

There are four TLS config levels:

1. Secure (TLS 1.3 only)
2. Default (TLS 1.2+ best ciphers that are well supported)
3. Default LDAP (TLS 1.2+ with less good ciphers)
4. Legacy (currently unused, TLS 1.2+ with all non-broken ciphers)

Highlights per component:

1. pinniped CLI
   - uses "secure" config against KAS
   - uses "default" for all other connections
2. concierge
   - uses "secure" config as an aggregated API server
   - uses "default" config as a impersonation proxy API server
   - uses "secure" config against KAS
   - uses "default" config for JWT authenticater (mostly, see code)
   - no changes to webhook authenticater (see code)
3. supervisor
   - uses "default" config as a server
   - uses "secure" config against KAS
   - uses "default" config against OIDC IDPs
   - uses "default LDAP" config against LDAP IDPs

Signed-off-by: Monis Khan <mok@vmware.com>
2021-11-17 16:55:35 -05:00
Monis Khan
0d6bf9db3e
kubecertagent: attempt to load signer as long as agent labels match
This change updates the kube cert agent to a middle ground behavior
that balances leader election gating with how quickly we load the
signer.

If the agent labels have not changed, we will attempt to load the
signer even if we cannot roll out the latest version of the kube
cert agent deployment.

This gives us the best behavior - we do not have controllers
fighting over the state of the deployment and we still get the
signer loaded quickly.

We will have a minute of downtime when the kube cert agent deployment
changes because the new pods will have to wait to become a leader
and for the new deployment to rollout the new pods.  We would need
to have a per pod deployment if we want to avoid that downtime (but
this would come at the cost of startup time and would require
coordination with the kubelet in regards to pod readiness).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-21 16:20:56 -04:00
Monis Khan
09467d3e24
kubecertagent: fix flakey tests
This commit makes the following changes to the kube cert agent tests:

1. Informers are synced on start using the controllerinit code
2. Deployment client and informer are synced per controller sync loop
3. Controller sync loop exits after two consistent errors
4. Use assert instead of require to avoid ending the test early

Signed-off-by: Monis Khan <mok@vmware.com>
2021-09-16 14:48:04 -04:00
Ryan Richard
bdcf468e52 Add log statement for when kube cert agent key has been loaded
Because it makes things easier to debug on a real cluster
2021-09-15 14:02:46 -07:00
Ryan Richard
55de160551 Bump the version number of the kube cert agent label
Not required, but within the spirit of using the version number.
Since the existing kube cert agent deployment will get deleted anyway
during an upgrade, it shouldn't hurt to change the version number.
New installations will get the new version number on the new kube cert
agent deployment.
2021-09-14 15:27:15 -07:00
Ryan Richard
cec9f3c4d7 Improve the selectors of Deployments and Services
Fixes #801. The solution is complicated by the fact that the Selector
field of Deployments is immutable. It would have been easy to just
make the Selectors of the main Concierge Deployment, the Kube cert agent
Deployment, and the various Services use more specific labels, but
that would break upgrades. Instead, we make the Pod template labels and
the Service selectors more specific, because those not immutable, and
then handle the Deployment selectors in a special way.

For the main Concierge and Supervisor Deployments, we cannot change
their selectors, so they remain "app: app_name", and we make other
changes to ensure that only the intended pods are selected. We keep the
original "app" label on those pods and remove the "app" label from the
pods of the Kube cert agent Deployment. By removing it from the Kube
cert agent pods, there is no longer any chance that they will
accidentally get selected by the main Concierge Deployment.

For the Kube cert agent Deployment, we can change the immutable selector
by deleting and recreating the Deployment. The new selector uses only
the unique label that has always been applied to the pods of that
deployment. Upon recreation, these pods no longer have the "app" label,
so they will not be selected by the main Concierge Deployment's
selector.

The selector of all Services have been updated to use new labels to
more specifically target the intended pods. For the Concierge Services,
this will prevent them from accidentally including the Kube cert agent
pods. For the Supervisor Services, we follow the same convention just
to be consistent and to help future-proof the Supervisor app in case it
ever has a second Deployment added to it.

The selector of the auto-created impersonation proxy Service was
also previously using the "app" label. There is no change to this
Service because that label will now select the correct pods, since
the Kube cert agent pods no longer have that label. It would be possible
to update that selector to use the new more specific label, but then we
would need to invent a way to pass that label into the controller, so
it seemed like more work than was justified.
2021-09-14 13:35:10 -07:00
Mayank Bhatt
68547f767d Copy hostNetwork field for kube-cert-agent
For clusters where the control plane nodes aren't running a CNI, the
kube-cert-agent pods deployed by concierge cannot be scheduled as they
don't know to use `hostNetwork: true`. This change allows embedding the
host network setting in the Concierge configuration. (by copying it from
the kube-controller-manager pod spec when generating the kube-cert-agent
Deployment)

Also fixed a stray double comma in one of the nearby tests.
2021-08-26 17:09:59 -07:00
Monis Khan
c356710f1f
Add leader election middleware
Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-20 12:18:25 -04:00
Monis Khan
7a812ac5ed
impersonatorconfig: only unload dynamiccert when proxy is disabled
In the upstream dynamiccertificates package, we rely on two pieces
of code:

1. DynamicServingCertificateController.newTLSContent which calls
   - clientCA.CurrentCABundleContent
   - servingCert.CurrentCertKeyContent
2. unionCAContent.VerifyOptions which calls
   - unionCAContent.CurrentCABundleContent

This results in calls to our tlsServingCertDynamicCertProvider and
impersonationSigningCertProvider.  If we Unset these providers, we
subtly break these consumers.  At best this results in test slowness
and flakes while we wait for reconcile loops to converge.  At worst,
it results in actual errors during runtime.  For example, we
previously would Unset the impersonationSigningCertProvider on any
sync loop error (even a transient one caused by a network blip or
a conflict between writes from different replicas of the concierge).
This would cause us to transiently fail to issue new certificates
from the token credential require API.  It would also cause us to
transiently fail to authenticate previously issued client certs
(which results in occasional Unauthorized errors in CI).

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-16 16:07:46 -04:00
Matt Moyer
58bbffded4
Switch to a slimmer distroless base image.
At a high level, it switches us to a distroless base container image, but that also includes several related bits:

- Add a writable /tmp but make the rest of our filesystems read-only at runtime.

- Condense our main server binaries into a single pinniped-server binary. This saves a bunch of space in
  the image due to duplicated library code. The correct behavior is dispatched based on `os.Args[0]`, and
  the `pinniped-server` binary is symlinked to `pinniped-concierge` and `pinniped-supervisor`.

- Strip debug symbols from our binaries. These aren't really useful in a distroless image anyway and all the
  normal stuff you'd expect to work, such as stack traces, still does.

- Add a separate `pinniped-concierge-kube-cert-agent` binary with "sleep" and "print" functionality instead of
  using builtin /bin/sleep and /bin/cat for the kube-cert-agent. This is split from the main server binary
  because the loading/init time of the main server binary was too large for the tiny resource footprint we
  established in our kube-cert-agent PodSpec. Using a separate binary eliminates this issue and the extra
  binary adds only around 1.5MiB of image size.

- Switch the kube-cert-agent code to use a JSON `{"tls.crt": "<b64 cert>", "tls.key": "<b64 key>"}` format.
  This is more robust to unexpected input formatting than the old code, which simply concatenated the files
  with some extra newlines and split on whitespace.

- Update integration tests that made now-invalid assumptions about the `pinniped-server` image.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-08-09 15:05:13 -04:00
Ryan Richard
2bba39d723 TestAgentController unit test is flaky, try to add workaround
TestAgentController really runs the controller and evaluates multiple
calls to the controller's Sync with real informers caching updates.
There is a large amount of non-determinism in this unit test, and it
does not always behave the same way. Because it makes assertions about
the specific errors that should be returned by Sync, it was not
accounting for some errors that are only returned by Sync once in a
while depending on the exact (unpredictable) order of operations.

This commit doesn't fix the non-determinism in the test, but rather
tries to work around it by also allowing other (undesired but
inevitable) error messages to appear in the list of actual error
messages returned by the calls to the Sync function.

Signed-off-by: Margo Crawford <margaretc@vmware.com>
2021-07-15 13:41:31 -07:00
Matt Moyer
657488fe90
Create CredentialIssuer at install, not runtime.
Previously, our controllers would automatically create a CredentialIssuer with a singleton name. The helpers we had for this also used "raw" client access and did not take advantage of the informer cache pattern.

With this change, the CredentialIssuer is always created at install time in the ytt YAML. The controllers now only update the existing CredentialIssuer status, and they do so using the informer cache as much as possible.

This change is targeted at only the kubecertagent controller to start. The impersonatorconfig controller will be updated in a following PR along with other changes.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-05-19 17:15:25 -05:00
Matt Moyer
b80cbb8cc5
Run kube-cert-agent pod as Concierge ServiceAccount.
Since 0dfb3e95c5, we no longer directly create the kube-cert-agent Pod, so our "use"
permission on PodSecurityPolicies no longer has the intended effect. Since the deployments controller is now the
one creating pods for us, we need to get the permission on the PodSpec of the target pod instead, which we do somewhat
simply by using the same service account as the main Concierge pods.

We still set `automountServiceAccountToken: false`, so this should not actually give any useful permissions to the
agent pod when running.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-05-03 16:20:13 -05:00
Matt Moyer
e532a88647
Add a new "legacy pod cleaner" controller.
This controller is responsible for cleaning up kube-cert-agent pods that were deployed by previous versions.

They are easily identified because they use a different `kube-cert-agent.pinniped.dev` label compared to the new agent pods (`true` vs. `v2`).

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-04-26 08:19:45 -06:00
Matt Moyer
54a8297cc4
Add generated mocks for kubecertagent.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-04-26 08:19:45 -06:00
Matt Moyer
2843c4f8cb
Refactor kube-cert-agent controllers to use a Deployment.
This is a relatively large rewrite of much of the kube-cert-agent controllers. Instead of managing raw Pod objects, they now create a single Deployment and let the builtin k8s controller handle it from there.

This reduces the amount of code we need and should handle a number of edge cases better, especially those where a Pod becomes "wedged" and needs to be recreated.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-04-26 08:19:45 -06:00
Monis Khan
00694c9cb6
dynamiccert: split into serving cert and CA providers
Signed-off-by: Monis Khan <mok@vmware.com>
2021-03-15 12:24:07 -04:00
Monis Khan
2d28d1da19
Implement all optional methods in dynamic certs provider
Signed-off-by: Monis Khan <mok@vmware.com>
2021-03-11 16:24:08 -05:00
Ryan Richard
d8c6894cbc All controller unit tests should not cancel context until test is over
All controller unit tests were accidentally using a timeout context
for the informers, instead of a cancel context which stays alive until
each test is completely finished. There is no reason to risk
unpredictable behavior of a timeout being reached during an individual
test, even though with the previous 3 second timeout it could only be
reached on a machine which is running orders of magnitude slower than
usual, since each test usually runs in about 100-300 ms. Unfortunately,
sometimes our CI workers might get that slow.

This sparked a review of other usages of timeout contexts in other
tests, and all of them were increased to a minimum value of 1 minute,
under the rule of thumb that our tests will be more reliable on slow
machines if they "pass fast and fail slow".
2021-03-04 17:26:01 -08:00
Matt Moyer
2a29303e3f
Fix label handling in kubecertagent controllers.
These controllers were a bit inconsistent. There were cases where the controllers ran out of the expected order and the custom labels might not have been applied.

We should still plan to remove this label handling or move responsibility into the middleware layer, but this avoids any regression.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-03-02 13:59:46 -06:00
Matt Moyer
643c60fd7a
Drop NewKubeConfigInfoPublisherController, start populating strategy frontend from kubecertagent execer controller.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-03-02 13:09:25 -06:00
Matt Moyer
c94ee7188c
Factor out issuerconfig.UpdateStrategy helper.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-03-01 16:21:10 -06:00
Matt Moyer
6565265bee
Use new 'go.pinniped.dev/generated/latest' package.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2021-02-16 13:00:08 -06:00
Monis Khan
0a9f446893
Update credential issuer logic to use status subresource
Signed-off-by: Monis Khan <mok@vmware.com>
2021-02-10 21:52:10 -05:00
Monis Khan
89b00e3702
Declare war on namespaces
Signed-off-by: Monis Khan <mok@vmware.com>
2021-02-10 21:52:07 -05:00
Margo Crawford
5611212ea9 Changing references from 1.19 to 1.20 2021-01-07 15:25:47 -08:00
Monis Khan
15a5332428
Reduce log spam
Signed-off-by: Monis Khan <mok@vmware.com>
2020-11-10 10:22:27 -05:00
Monis Khan
418f4d20ae
Use parent func to indicate when the controller queue is a singleton
This prevents unnecessary sync loop runs when the controller is
running with a single worker.  When the controller is running with
more than one worker, it prevents subtle bugs that can cause the
controller to go "back in time."

Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-04 11:08:10 -06:00
Matt Moyer
59263ea733
Rename CredentialIssuerConfig to CredentialIssuer.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-11-02 17:39:42 -06:00
Ryan Richard
75c35e74cc Refactor and add unit tests for previous commit to run agent pod as root 2020-11-02 15:03:37 -08:00
Ryan Richard
a01921012d
kubecertagent: explicitly run as root
We need root here because the files that this pod reads are
most likely restricted to root access.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-11-02 16:33:46 -05:00
Ryan Richard
ab5c04b1f3
Merge pull request #176 from vmware-tanzu/agent_pod_additional_label_handling
Handle custom labels better in the agent pod controllers
2020-11-02 09:08:42 -08:00
Ryan Richard
7597b12a51 Small unit test changes for deleter_test.go 2020-11-02 08:40:39 -08:00
Ryan Richard
f76b9857da Don't use custom labels when selecting an agent pod
And delete the agent pod when it needs its custom labels to be
updated, so that the creator controller will notice that it is missing
and immediately create it with the new custom labels.
2020-10-30 17:41:17 -07:00
Matt Moyer
9e1922f1ed
Split the config CRDs into two API groups.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-30 19:22:46 -05:00
Ryan Richard
94f20e57b1 Concierge controllers add labels to all created resources 2020-10-15 10:14:23 -07:00
Andrew Keesler
b21b43c654
Fix expected CIC status message on non-hosted control planes 2020-09-24 17:56:55 -04:00
Andrew Keesler
9e0195e024
kubecertagent: use initial event for when key can't be found
This should fix integration tests running on clusters that don't have
visible controller manager pods (e.g., GKE). Pinniped should boot, not
find any controller manager pods, but still post a status in the CIC.

I also updated a test helper so that we could tell the difference
between when an event was not added and when an event was added with
an empty key.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-24 16:54:20 -04:00
Andrew Keesler
d853cbc7ff
Plumb through ImagePullSecrets to agent pod
Right now in the YTT templates we assume that the agent pods are gonna use
the same image as the main Pinniped deployment, so we can use the same logic
for the image pull secrets.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-09-24 15:52:05 -04:00