We want to run all of the fosite validations in the authorize
endpoint, but we don't need to store anything yet because
we are storing what we need for later in the upstream state
parameter.
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Add a new helper method to plog to make a consistent way to log
expected errors at the info level (as opposed to unexpected
system errors that would be logged using plog.Error)
Signed-off-by: Ryan Richard <richardry@vmware.com>
Also move definition of our oauth client and the general fosite
configuration to a helper so we can use the same config to construct
the handler for both test and production code.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This prevents unnecessary sync loop runs when the controller is
running with a single worker. When the controller is running with
more than one worker, it prevents subtle bugs that can cause the
controller to go "back in time."
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Previously we were injecting the whole oauth handler chain into this function,
which meant we were essentially writing unit tests to test our tests. Let's push
some of this logic into the source code.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Does not validate incoming request parameters yet. Also is not
served on the http/https ports yet. Those will come in future commits.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I tried to follow a principle of encapsulation here - we can still default to
peeps making connections to 80/443 on a Service object, but internally we will
use 8080/8443.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
And delete the agent pod when it needs its custom labels to be
updated, so that the creator controller will notice that it is missing
and immediately create it with the new custom labels.
This is the first of a few related changes that re-organize our API after the big recent changes that introduced the supervisor component.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Setting a Secret in the supervisor's namespace with a special name
will cause it to get picked up and served as the supervisor's TLS
cert for any request which does not have a matching SNI cert.
- This is especially useful for when there is no DNS record for an
issuer and the user will be accessing it via IP address. This
is not how we would expect it to be used in production, but it
might be useful for other cases.
- Includes a new integration test
- Also suppress all of the warnings about ignoring the error returned by
Close() in lines like `defer x.Close()` to make GoLand happier
- TLS certificates can be configured on the OIDCProviderConfig using
the `secretName` field.
- When listening for incoming TLS connections, choose the TLS cert
based on the SNI hostname of the incoming request.
- Because SNI hostname information on incoming requests does not include
the port number of the request, we add a validation that
OIDCProviderConfigs where the issuer hostnames (not including port
number) are the same must use the same `secretName`.
- Note that this approach does not yet support requests made to an
IP address instead of a hostname. Also note that `localhost` is
considered a hostname by SNI.
- Add port 443 as a container port to the pod spec.
- A new controller watches for TLS secrets and caches them in memory.
That same in-memory cache is used while servicing incoming connections
on the TLS port.
- Make it easy to configure both port 443 and/or port 80 for various
Service types using our ytt templates for the supervisor.
- When deploying to kind, add another nodeport and forward it to the
host on another port to expose our new HTTPS supervisor port to the
host.
- When two different Issuers have the same host (i.e. they differ
only by path) then they must have the same secretName. This is because
it wouldn't make sense for there to be two different TLS certificates
for one host. Find any that do not have the same secret name to
put an error status on them and to avoid serving OIDC endpoints for
them. The host comparison is case-insensitive.
- Issuer hostnames should be treated as case-insensitive, because
DNS hostnames are case-insensitive. So https://me.com and
https://mE.cOm are duplicate issuers. However, paths are
case-sensitive, so https://me.com/A and https://me.com/a are
different issuers. Fixed this in the issuer validations and in the
OIDC Manager's request router logic.
EC keys are smaller and take less time to generate. Our integration
tests were super flakey because generating an RSA key would take up to
10 seconds *gasp*. The main token verifier that we care about is
Kubernetes, which supports P256, so hopefully it won't be that much of
an issue that our default signing key type is EC. The OIDC spec seems
kinda squirmy when it comes to using non-RSA signing algorithms...
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- The OIDCProviderConfigWatcherController synchronizes the
OIDCProviderConfig settings to dynamically mount and unmount the
OIDC discovery endpoints for each provider
- Integration test passes but unit tests need to be added still
This should fix integration tests running on clusters that don't have
visible controller manager pods (e.g., GKE). Pinniped should boot, not
find any controller manager pods, but still post a status in the CIC.
I also updated a test helper so that we could tell the difference
between when an event was not added and when an event was added with
an empty key.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Right now in the YTT templates we assume that the agent pods are gonna use
the same image as the main Pinniped deployment, so we can use the same logic
for the image pull secrets.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Simplifies the implementation, makes it more consistent with other
updaters of the cic (CredentialIssuerConfig), and also retries on
update conflicts
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Only inject things through the constructor that the controller
will need
- Use pkg private constants when possible for things that are not
actually configurable by the user
- Make the agent pod template private to the pkg
- Introduce a test helper to reduce some duplicated test code
- Remove some `it.Focus` lines that were accidentally committed, and
repair the broken tests that they were hiding
I think we want to reconcile on these pod template fields so that if
someone were to redeploy Pinniped with a new image for the agent, the
agent would get updated immediately. Before this change, the agent image
wouldn't get updated until the agent pod was deleted.
This thing is supposed to be used to help our CredentialRequest handler issue certs with a dynamic
CA keypair.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
3 main reasons:
- The cert and key that we store in this object are not always used for TLS.
- The package name "provider" was a little too generic.
- dynamiccert.Provider reads more go-ish than provider.DynamicCertProvider.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Lots of TODOs added that need to be resolved to finish this WIP
- execer_test.go seems like it should be passing, but it fails (sigh)
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This also has fallback compatibility support if no IDP is specified and there is exactly one IDP in the cache.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- All of the `kubecertagent` controllers now take two informers
- This is moving in the direction of creating the agent pods in the
Pinniped installation namespace, but that will come in a future
commit
New resource naming conventions:
- Do not repeat the Kind in the name,
e.g. do not call it foo-cluster-role-binding, just call it foo
- Names will generally start with a prefix to identify our component,
so when a user lists all objects of that kind, they can tell to which
component it is related,
e.g. `kubectl get configmaps` would list one named "pinniped-config"
- It should be possible for an operator to make the word "pinniped"
mostly disappear if they choose, by specifying the app_name in
values.yaml, to the extent that is practical (but not from APIService
names because those are hardcoded in golang)
- Each role/clusterrole and its corresponding binding have the same name
- Pinniped resource names that must be known by the server golang code
are passed to the code at run time via ConfigMap, rather than
hardcoded in the golang code. This also allows them to be prepended
with the app_name from values.yaml while creating the ConfigMap.
- Since the CLI `get-kubeconfig` command cannot guess the name of the
CredentialIssuerConfig resource in advance anymore, it lists all
CredentialIssuerConfig in the app's namespace and returns an error
if there is not exactly one found, and then uses that one regardless
of its name
Eventually we could refactor to remove support for the old APIs, but they are so similar that a single implementation seems to handle both easily.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This was previously using the unpadded (raw) base64 encoder, which worked sometimes (if the CA happened to be a length that didn't require padding). The correct encoding is the `base64.StdEncoding` one that includes padding.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Add flag parsing and help messages for root command,
`exchange-credential` subcommand, and new `get-kubeconfig` subcommand
- The new `get-kubeconfig` subcommand is a work in progress in this
commit
- Also add here.Doc() and here.Docf() to enable nice heredocs in
our code
- The certs manager controller, along with its sibling certs expirer
and certs observer controllers, are generally useful for any process
that wants to create its own CA and TLS certs, but only if the
updating of the APIService is not included in those controllers
- So that functionality for updating APIServices is moved to a new
controller which watches the same Secret which is used by those
other controllers
- Also parameterize `NewCertsManagerController` with the service name
and the CA common name to make the controller more reusable
When we use RSA private keys to sign our test certificates, we run
into strange test timeouts. The internal/controller/apicerts package
was timing out on my machine more than once every 3 runs. When I
changed the RSA crypto to EC crypto, this timeout goes away. I'm not
gonna try to figure out what the deal is here because I think it would
take longer than it would be worth (although I am sure it is some fun
story involving prime numbers; the goroutine traces for timed out
tests would always include some big.Int operations involving prime
numbers...).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Controllers will automatically run again when there's an error,
but when we want to update CredentialIssuerConfig from server.go
we should be careful to retry on conflicts
- Add unit tests for `issuerconfig.CreateOrUpdateCredentialIssuerConfig()`
which was covered by integration tests in previous commits, but not
covered by units tests yet.
So that operators won't look at the lifetime of the CA cert and be
like, "wtf, why does the serving cert have the lifetime that I
specified, but its CA cert is valid for 100 years".
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This should simplify our build/test setup quite a bit, since it means we have only a single module (at the top level) with all hand-written code. I'll leave `module.sh` alone for now but we may be able to simplify that a bit more.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Controller and aggregated API server are allowed to run
- Keep retrying to borrow the cluster signing key in case the failure
to get it was caused by a transient failure
- The CredentialRequest endpoint will always return an authentication
failure as long as the cluster signing key cannot be borrowed
- Update which integration tests are skipped to reflect what should
and should not work based on the cluster's capability under this
new behavior
- Move CreateOrUpdateCredentialIssuerConfig() and related methods
to their own file
- Update the CredentialIssuerConfig's Status every time we try to
refresh the cluster signing key
- Indicate the success or failure of the cluster signing key strategy
- Also introduce the concept of "capabilities" of an integration test
cluster to allow the integration tests to be run against clusters
that do or don't allow the borrowing of the cluster signing key
- Tests that are not expected to pass on clusters that lack the
borrowing of the signing key capability are now ignored by
calling the new library.SkipUnlessClusterHasCapability test helper
- Rename library.Getenv to library.GetEnv
- Add copyrights where they were missing
We are seeing between 1 and 2 minutes of difference between the current time
reported in the API server pod and the pinniped pods on one of our testing
environments. Hopefully this change makes our tests pass again.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
These configuration knobs are much more human-understandable than the
previous percentage-based threshold flag.
We now allow users to set the lifetime of the serving cert via a ConfigMap.
Previously this was hardcoded to 1 year.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
We need a way to validate that this generated code is up to date. I added
a long-term engineering TODO for this.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The rotation is forced by a new controller that deletes the serving cert
secret, as other controllers will see this deletion and ensure that a new
serving cert is created.
Note that the integration tests now have an addition worst case runtime of
60 seconds. This is because of the way that the aggregated API server code
reloads certificates. We will fix this in a future story. Then, the
integration tests should hopefully get much faster.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This switches us back to an approach where we use the Pod "exec" API to grab the keys we need, rather than forcing our code to run on the control plane node. It will help us fail gracefully (or dynamically switch to alternate implementations) when the cluster is not self-hosted.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
- We want to follow the <noun>Request convention.
- The actual operation does not login a user, but it does retrieve a
credential with which they can login.
- This commit includes changes to all LoginRequest-related symbols and
constants to try to update their names to follow the new
CredentialRequest type.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
As discussed in API review, this field exists for convenience right
now. Since the username/groups are encoded in the Credential sent in
the LoginRequestStatus, the client still has access to their
user/groups information. We want to remove this for now to be
conservative and limit our API surface area (smaller surface area =
less to maintain). We can always add this back in the future.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Add integration test for serving cert auto-generation and rotation
- Add unit test for `WithInitialEvent` of the cert manager controller
- Move UpdateAPIService() into the `apicerts` package, since that is
the only user of the function.
- Add a unit test for each cert controller
- Make DynamicTLSServingCertProvider an interface and use a mutex
internally
- Create a shared ToPEM function instead of having two very similar
functions
- Move the ObservableWithInformerOption test helper to testutils
- Rename some variables and imports
- Refactors the existing cert generation code into controllers
which read and write a Secret containing the certs
- Does not add any new functionality yet, e.g. no new handling
for cert expiration, and no leader election to allow for
multiple servers running simultaneously
- This commit also doesn't add new tests for the cert generation
code, but it should be more unit testable now as controllers
- No functional changes
- Move all the stuff about clients and controllers into the controller
package
- Add more comments and organize the code more into more helper
functions to make each function smaller
- Previously the golang code would create a Service and an APIService.
The APIService would be given an owner reference which pointed to
the namespace in which the app was installed.
- This prevented the app from being uninstalled. The namespace would
refuse to delete, so `kapp delete` or `kubectl delete` would fail.
- The new approach is to statically define the Service and an APIService
in the deployment.yaml, except for the caBundle of the APIService.
Then the golang code will perform an update to add the caBundle at
runtime.
- When the user uses `kapp deploy` or `kubectl apply` either tool will
notice that the caBundle is not declared in the yaml and will
therefore avoid editing that field.
- When the user uses `kapp delete` or `kubectl delete` either tool will
destroy the objects because they are statically declared with names
in the yaml, just like all of the other objects. There are no
ownerReferences used, so nothing should prevent the namespace from
being deleted.
- This approach also allows us to have less golang code to maintain.
- In the future, if our golang controllers want to dynamically add
an Ingress or other objects, they can still do that. An Ingress
would point to our statically defined Service as its backend.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Seems like the next step is to allow override of the CA bundle; I didn't
do that here for simplicity of the commit, but seems like it is the right
thing to do in the future.
- Also includes bumping the api and client-go dependencies to the newer
version which also moved LoginDiscoveryConfig to the
crds.placeholder.suzerain-io.github.io group in the generated code
- Also, don't repeat `spec.Parallel()` because, according to the docs
for the spec package, "options are inherited by subgroups and subspecs"
- Two tests are left pending to be filled in on the next commit
The error was:
```
internal/certauthority/certauthority.go:68:15: err113: do not define dynamic errors, use wrapped static errors instead: "fmt.Errorf(\"expected CA to be a single certificate, found %d certificates\", certCount)" (goerr113)
return nil, fmt.Errorf("expected CA to be a single certificate, found %d certificates", certCount)
^
exit status 1
```
I'm not sure if I love this err113 linter.
I think we may still split this apart into multiple packages, but for now it works pretty well in both use cases.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Also fix mistakes in the deployment.yaml
- Also hardcode the ownerRef kind and version because otherwise we get an error
Signed-off-by: Monis Khan <mok@vmware.com>