Commit Graph

14 Commits

Author SHA1 Message Date
Ryan Richard
ca6c29e463 Fix deadlock during shutdown which prevented leader election cleanup
Before this fix, the deadlock would prevent the leader pod from giving
up its lease, which would make it take several minutes for new pods to
be allowed to elect a new leader. During that time, no Pinniped
controllers could write to the Kube API, so important resources were not
being updated during that window. It would also make pod shutdown take
about 1 minute.

After this fix, the leader gives up its lease immediately, and pod
shutdown takes about 1 second. This improves restart/upgrade time and
also fixes the problem where there was no leader for several minutes
after a restart/upgrade.

The deadlock was between the post-start hook and the pre-shutdown hook.
The pre-shutdown hook blocked until a certain background goroutine in
the post-start hook finished, but that goroutine could not finish until
the pre-shutdown hook finished. Thus, they were both blocked, waiting
for each other infinitely. Eventually the process would be externally
killed.

This deadlock was most likely introduced by some change in Kube's
generic api server package related to how the many complex channels used
during server shutdown interact with each other, and was not noticed
when we upgraded to the version which introduced the change.
2023-09-20 16:54:24 -07:00
Benjamin A. Petersen
3160b5bad1 Reorganized FederationDomain packages to avoid circular dependency
Co-authored-by: Ryan Richard <richardry@vmware.com>
2023-09-11 11:09:50 -07:00
Joshua Casey
38230fc518 Use pversion to retrieve buildtime information 2023-08-28 11:54:27 -05:00
Ryan Richard
c6c2c525a6 Upgrade the linter and fix all new linter warnings
Also fix some tests that were broken by bumping golang and dependencies
in the previous commits.

Note that in addition to changes made to satisfy the linter which do not
impact the behavior of the code, this commit also adds ReadHeaderTimeout
to all usages of http.Server to satisfy the linter (and because it
seemed like a good suggestion).
2022-08-24 14:45:55 -07:00
Monis Khan
0d285ce993
Ensure concierge and supervisor gracefully exit
Changes made to both components:

1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever

Changes made to concierge:

1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook

Changes made to supervisor:

1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle

Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit.  This reduces
the time needed for the next leader to be elected.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-08-30 20:29:52 -04:00
Ryan Richard
c82f568b2c certauthority.go: Refactor issuing client versus server certs
We were previously issuing both client certs and server certs with
both extended key usages included. Split the Issue*() methods into
separate methods for issuing server certs versus client certs so
they can have different extended key usages tailored for each use
case.

Also took the opportunity to clean up the parameters of the Issue*()
methods and New() methods to more closely match how we prefer to call
them. We were always only passing the common name part of the
pkix.Name to New(), so now the New() method just takes the common name
as a string. When making a server cert, we don't need to set the
deprecated common name field, so remove that param. When making a client
cert, we're always making it in the format expected by the Kube API
server, so just accept the username and group as parameters directly.
2021-03-12 16:09:37 -08:00
Ryan Richard
0b300cbe42 Use TokenCredentialRequest instead of base64 token with impersonator
To make an impersonation request, first make a TokenCredentialRequest
to get a certificate. That cert will either be issued by the Kube
API server's CA or by a new CA specific to the impersonator. Either
way, you can then make a request to the impersonator and present
that client cert for auth and the impersonator will accept it and
make the impesonation call on your behalf.

The impersonator http handler now borrows some Kube library code
to handle request processing. This will allow us to more closely
mimic the behavior of a real API server, e.g. the client cert
auth will work exactly like the real API server.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-03-10 10:30:06 -08:00
Monis Khan
abc941097c
Add WhoAmIRequest Aggregated Virtual REST API
This change adds a new virtual aggregated API that can be used by
any user to echo back who they are currently authenticated as.  This
has general utility to end users and can be used in tests to
validate if authentication was successful.

Signed-off-by: Monis Khan <mok@vmware.com>
2021-02-22 20:02:41 -05:00
Monis Khan
f7958ae75b
Add no-op list support to token credential request
This allows us to keep all of our resources in the pinniped category
while not having kubectl return errors for calls such as:

kubectl get pinniped -A

Signed-off-by: Monis Khan <mok@vmware.com>
2021-02-05 10:59:39 -05:00
Monis Khan
efe1fa89fe Allow multiple Pinnipeds to work on same cluster
Yes, this is a huge commit.

The middleware allows you to customize the API groups of all of the
*.pinniped.dev API groups.

Some notes about other small things in this commit:
- We removed the internal/client package in favor of pkg/conciergeclient. The
  two packages do basically the same thing. I don't think we use the former
  anymore.
- We re-enabled cluster-scoped owner assertions in the integration tests.
  This code was added in internal/ownerref. See a0546942 for when this
  assertion was removed.
- Note: the middlware code is in charge of restoring the GV of a request object,
  so we should never need to write mutations that do that.
- We updated the supervisor secret generation to no longer manually set an owner
  reference to the deployment since the middleware code now does this. I think we
  still need some way to make an initial event for the secret generator
  controller, which involves knowing the namespace and the name of the generated
  secret, so I still wired the deployment through. We could use a namespace/name
  tuple here, but I was lazy.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
2021-02-02 15:18:41 -08:00
Margo Crawford
5611212ea9 Changing references from 1.19 to 1.20 2021-01-07 15:25:47 -08:00
Monis Khan
15a5332428
Reduce log spam
Signed-off-by: Monis Khan <mok@vmware.com>
2020-11-10 10:22:27 -05:00
Matt Moyer
f0320dfbd8
Rename login API to login.concierge.pinniped.dev.
This is the first of a few related changes that re-organize our API after the big recent changes that introduced the supervisor component.

Signed-off-by: Matt Moyer <moyerm@vmware.com>
2020-10-30 09:58:28 -05:00
Ryan Richard
5b3dd5fc7d
Rename pinniped-server -> pinniped-concierge
Do we like this? We don't know yet.

Signed-off-by: Andrew Keesler <akeesler@vmware.com>
2020-10-06 14:59:03 -04:00