When the token exchange grant type is used to get a cluster-scoped
ID token, the returned token has a new audience value. The client ID
of the client which performed the authorization was lost. This didn't
matter before, since the only client was `pinniped-cli`, but now that
dynamic clients can be registered, the information would be lost in the
cluster-scoped ID token. It could be useful for logging, tracing, or
auditing, so preserve the information by putting the client ID into the
`azp` claim in every ID token (authcode exchange, clsuter-scoped, and
refreshed ID tokens).
- For backwards compatibility with older Pinniped CLIs, the pinniped-cli
client does not need to request the username or groups scopes for them
to be granted. For dynamic clients, the usual OAuth2 rules apply:
the client must be allowed to request the scopes according to its
configuration, and the client must actually request the scopes in the
authorization request.
- If the username scope was not granted, then there will be no username
in the ID token, and the cluster-scoped token exchange will fail since
there would be no username in the resulting cluster-scoped ID token.
- The OIDC well-known discovery endpoint lists the username and groups
scopes in the scopes_supported list, and lists the username and groups
claims in the claims_supported list.
- Add username and groups scopes to the default list of scopes
put into kubeconfig files by "pinniped get kubeconfig" CLI command,
and the default list of scopes used by "pinniped login oidc" when
no list of scopes is specified in the kubeconfig file
- The warning header about group memberships changing during upstream
refresh will only be sent to the pinniped-cli client, since it is
only intended for kubectl and it could leak the username to the
client (which may not have the username scope granted) through the
warning message text.
- Add the user's username to the session storage as a new field, so that
during upstream refresh we can compare the original username from the
initial authorization to the refreshed username, even in the case when
the username scope was not granted (and therefore the username is not
stored in the ID token claims of the session storage)
- Bump the Supervisor session storage format version from 2 to 3
due to the username field being added to the session struct
- Extract commonly used string constants related to OIDC flows to api
package.
- Change some import names to make them consistent:
- Always import github.com/coreos/go-oidc/v3/oidc as "coreosoidc"
- Always import go.pinniped.dev/generated/latest/apis/supervisor/oidc
as "oidcapi"
- Always import go.pinniped.dev/internal/oidc as "oidc"
- Add dynamic client unit tests for the upstream OIDC callback and
POST login endpoints.
- Enhance a few log statements to print the full fosite error messages
into the logs where they were previously only printing the name of
the error type.
- Enhance the token exchange to check that the same client is used
compared to the client used during the original authorization and
token requests, and also check that the client has the token-exchange
grant type allowed in its configuration.
- Reduce the minimum required bcrypt cost for OIDCClient secrets
because 15 is too slow for real-life use, especially considering
that every login and every refresh flow will require two client auths.
- In unit tests, use bcrypt hashes with a cost of 4, because bcrypt
slows down by 13x when run with the race detector, and we run our
tests with the race detector enabled, causing the tests to be
unacceptably slow. The production code uses a higher minimum cost.
- Centralize all pre-computed bcrypt hashes used by unit tests to a
single place. Also extract some other useful test helpers for
unit tests related to OIDCClients.
- Add tons of unit tests for the token endpoint related to dynamic
clients for authcode exchanges, token exchanges, and refreshes.
This is only a first commit towards making this feature work.
- Hook dynamic clients into fosite by returning them from the storage
interface (after finding and validating them)
- In the auth endpoint, prevent the use of the username and password
headers for dynamic clients to force them to use the browser-based
login flows for all the upstream types
- Add happy path integration tests in supervisor_login_test.go
- Add lots of comments (and some small refactors) in
supervisor_login_test.go to make it much easier to understand
- Add lots of unit tests for the auth endpoint regarding dynamic clients
(more unit tests to be added for other endpoints in follow-up commits)
- Enhance crud.go to make lifetime=0 mean never garbage collect,
since we want client secret storage Secrets to last forever
- Move the OIDCClient validation code to a package where it can be
shared between the controller and the fosite storage interface
- Make shared test helpers for tests that need to create OIDC client
secret storage Secrets
- Create a public const for "pinniped-cli" now that we are using that
string in several places in the production code
The following validation is enforced:
1. Names must start with client.oauth.pinniped.dev-
2. Redirect URIs must start with https://
or http://127.0.0.1
or http://::1
3. All spec lists must not have duplicates
Added an integration test to assert all static validations.
Signed-off-by: Monis Khan <mok@vmware.com>
Also fix some test failures on the callback handler, register the
new login handler in manager.go and add a (half baked) integration test
Signed-off-by: Margo Crawford <margaretc@vmware.com>
Go 1.18.1 started using MacOS' x509 verification APIs on Macs
rather than Go's own. The error messages are different.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
- Two of the linters changed their names
- Updated code and nolint comments to make all linters pass with 1.44.2
- Added a new hack/install-linter.sh script to help developers install
the expected version of the linter for local development
This allows us to target browser based tests with the regex:
go test -v -race -count 1 -timeout 0 ./test/integration -run '/_Browser'
New tests that call browsertest.Open will automatically be forced to
follow this convention.
Signed-off-by: Monis Khan <mok@vmware.com>
When the test was going to fail, a goroutine would accidentally block
on writing to an unbuffered channel, and the spawnTestGoroutine helper
would wait for that goroutine to end on cleanup, causing the test to
hang forever while it was trying to fail.
This change allows configuration of the http and https listeners
used by the supervisor.
TCP (IPv4 and IPv6 with any interface and port) and Unix domain
socket based listeners are supported. Listeners may also be
disabled.
Binding the http listener to TCP addresses other than 127.0.0.1 or
::1 is deprecated.
The deployment now uses https health checks. The supervisor is
always able to complete a TLS connection with the use of a bootstrap
certificate that is signed by an in-memory certificate authority.
To support sidecar containers used by service meshes, Unix domain
socket based listeners include ACLs that allow writes to the socket
file from any runAsUser specified in the pod's containers.
Signed-off-by: Monis Khan <mok@vmware.com>
- Make everything private
- Drop unused AuthTime field
- Use %q format string instead of "%s"
- Only rely on GetRawAttributeValues in AttributeUnchangedSinceLogin
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the new TLS integration tests to:
1. Only create the supervisor default TLS serving cert if needed
2. Port forward the node port supervisor service since that is
available in all environments
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the TLS config used by all pinniped components.
There are no configuration knobs associated with this change. Thus
this change tightens our static defaults.
There are four TLS config levels:
1. Secure (TLS 1.3 only)
2. Default (TLS 1.2+ best ciphers that are well supported)
3. Default LDAP (TLS 1.2+ with less good ciphers)
4. Legacy (currently unused, TLS 1.2+ with all non-broken ciphers)
Highlights per component:
1. pinniped CLI
- uses "secure" config against KAS
- uses "default" for all other connections
2. concierge
- uses "secure" config as an aggregated API server
- uses "default" config as a impersonation proxy API server
- uses "secure" config against KAS
- uses "default" config for JWT authenticater (mostly, see code)
- no changes to webhook authenticater (see code)
3. supervisor
- uses "default" config as a server
- uses "secure" config against KAS
- uses "default" config against OIDC IDPs
- uses "default LDAP" config against LDAP IDPs
Signed-off-by: Monis Khan <mok@vmware.com>
- pull construction of authenticators.Response into searchAndBindUser
- remove information about the identity provider in the error that gets
returned to users. Put it in debug instead, where it may show up in
logs.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
- changed to use custom authenticators.Response rather than the k8s one
that doesn't include space for a DN
- Added more checking for correct idp type in token handler
- small style changes
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This stores the user DN in the session data upon login and checks that
the entry still exists upon refresh. It doesn't check anything
else about the entry yet.
CertificatesV1beta1 was removed in Kube 1.22, so the tests cannot
blindly rely on it anymore. Use CertificatesV1 whenever the server
reports that is available, and otherwise use the old
CertificatesV1beta1.
Note that CertificatesV1 was introduced in Kube 1.19.
Changes made to both components:
1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever
Changes made to concierge:
1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook
Changes made to supervisor:
1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle
Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit. This reduces
the time needed for the next leader to be elected.
Signed-off-by: Monis Khan <mok@vmware.com>
Instead of blindly waiting long enough for a disruptive change to
have been observed by the old leader and followers, we instead rely
on the approximation that checkOnlyLeaderCanWrite provides - i.e.
only a single actor believes they are the leader. This does not
account for clients that were in the followers list before and after
the disruptive change, but it serves as a reasonable approximation.
Signed-off-by: Monis Khan <mok@vmware.com>
Those images that are pulled from Dockerhub will cause pull failures
on some test clusters due to Dockerhub rate limiting.
Because we already have some images that we use for testing, and
because those images are already pre-loaded onto our CI clusters
to make the tests faster, use one of those images and always specify
PullIfNotPresent to avoid pulling the image again during the integration
test.
OpenShift has good defaults for these duration fields that we can
use instead of coming up with them ourselves:
e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)
Copied here for easy future reference:
// We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts.
// We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts.
// We want a single set of guidance for nearly every lease in openshift. If you're special, we'll let you know.
// 1. clock skew tolerance is leaseDuration-renewDeadline == 30s
// 2. kube-apiserver downtime tolerance is == 78s
// lastRetry=floor(renewDeadline/retryPeriod)*retryPeriod == 104
// downtimeTolerance = lastRetry-retryPeriod == 78s
// 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s
// 4. worst graceful lease acquisition is retryPeriod == 26s
if ret.LeaseDuration.Duration == 0 {
ret.LeaseDuration.Duration = 137 * time.Second
}
if ret.RenewDeadline.Duration == 0 {
// this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew
// if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew),
// then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will
// be 33 seconds of extra time before the lease is lost.
ret.RenewDeadline.Duration = 107 * time.Second
}
if ret.RetryPeriod.Duration == 0 {
ret.RetryPeriod.Duration = 26 * time.Second
}
Signed-off-by: Monis Khan <mok@vmware.com>
This change fixes a small race condition that occurred when the
current leader failed to renew its lease. Before this change, the
leader would first release the lease via the Kube API and then would
update its in-memory status to reflect that change. Now those
events occur in the reverse (i.e. correct) order.
Signed-off-by: Monis Khan <mok@vmware.com>
Even though a client may hold the leader election lock in the Kube
lease API, that does not mean it has had a chance to update its
internal state to reflect that. Thus we retry the checks in
checkOnlyLeaderCanWrite a few times to allow the client to catch up.
Signed-off-by: Monis Khan <mok@vmware.com>