Update audit log proposal
This commit is contained in:
parent
831abc315e
commit
3cf3b28c5b
@ -77,8 +77,7 @@ There will be very few user-facing configuration options for audit logging in th
|
|||||||
found to be needed, more configuration could be added in future versions.
|
found to be needed, more configuration could be added in future versions.
|
||||||
|
|
||||||
This proposal recommends adding a single on/off install-time configuration option for disabling audit logs. By default,
|
This proposal recommends adding a single on/off install-time configuration option for disabling audit logs. By default,
|
||||||
audit logs will be enabled. An admin user who is concerned about logging identities, for example because usernames may
|
audit logs will be disabled. Usernames may be considered PII, so disabled by default avoids potentially logging PII.
|
||||||
be considered PII, may disable audit logging.
|
|
||||||
|
|
||||||
Like other install-time configuration options, this option would appear in the values.yaml file of the Supervisor and
|
Like other install-time configuration options, this option would appear in the values.yaml file of the Supervisor and
|
||||||
Concierge deployment directories. The selected value would be rendered into the "static" ConfigMap, and read by the
|
Concierge deployment directories. The selected value would be rendered into the "static" ConfigMap, and read by the
|
||||||
@ -106,20 +105,23 @@ The Supervisor's audit logs would include events such as:
|
|||||||
- Downstream login (started, succeeded, failed)
|
- Downstream login (started, succeeded, failed)
|
||||||
- Downstream token exchange (succeeded, failed)
|
- Downstream token exchange (succeeded, failed)
|
||||||
- Session expired
|
- Session expired
|
||||||
- Maybe: The equivalent of access log events for all Supervisor endpoints, since there is no other component providing
|
- The equivalent of access log events for all Supervisor endpoints, since there is no other component providing
|
||||||
access logs. This would include logging things like calls to the Supervisor's OIDC well-known discovery endpoint.
|
access logs. This would include logging things like calls to the Supervisor's OIDC well-known discovery endpoint.
|
||||||
These logs could help an investigator determine more about the usage pattern of a suspicious client.
|
These logs could help an investigator determine more about the usage pattern of a suspicious client.
|
||||||
- Maybe: Newly authenticated user is associated with “admin” RBAC. Note that the Supervisor is not directly aware of
|
- The identity (username, group memberships) of newly authenticated users
|
||||||
RBAC, so determining this would require otherwise unnecessary calls to the Kubernetes API server, which would degrade
|
- Newly authenticated user is associated with “admin-like” RBAC. Any user that is allowed to perform
|
||||||
the performance of the Supervisor. It's also not clear what would constitute "admin" level access, since RBAC is
|
`verbs=* groups=* resources=*` according to a subject access review API call shall be considered "admin-like".
|
||||||
configurable at a very fine-grained level. On the other hand, the Supervisor is directly aware of the user's group
|
This would only indicate that the user has "admin-like" permissions on the Supervisor cluster itself, not on other
|
||||||
memberships, which could be logged.
|
workload clusters, since the Supervisor is not aware of the RBAC settings on the workload clusters.
|
||||||
|
|
||||||
The Concierge's audit logs would include events such as:
|
The Concierge's audit logs would include events such as:
|
||||||
|
|
||||||
- Token credential request (succeeded, failed, maybe maps to admin RBAC). While already captured by the API server audit
|
- Token credential request (succeeded, failed, maps to admin RBAC). While already captured by the API server audit
|
||||||
logs, those should likely be set to metadata. Duplicating the event allows for more controlled capture & management of
|
logs, those should likely be set to metadata. Duplicating the event allows for more controlled capture & management of
|
||||||
data.
|
data.
|
||||||
|
- Similar to the Supervisor, the TCR endpoint could log when an authenticated user is associated with “admin-like”
|
||||||
|
RBAC. Any user that is allowed to perform `verbs=* groups=* resources=*` according to a subject access review API
|
||||||
|
call shall be considered "admin-like".
|
||||||
- WhoAmI Request. While already captured by the API server audit logs, duplicating the event allows for more controlled
|
- WhoAmI Request. While already captured by the API server audit logs, duplicating the event allows for more controlled
|
||||||
capture & management of data.
|
capture & management of data.
|
||||||
|
|
||||||
@ -140,7 +142,8 @@ might lose a few lines of logs. A normal pod shutdown should be able to flush th
|
|||||||
will be added to both the Concierge and Supervisor apps Deployments' Pods. These containers will tail those audit logs
|
will be added to both the Concierge and Supervisor apps Deployments' Pods. These containers will tail those audit logs
|
||||||
to stdout, thus effectively moving those log lines from files on the Pod to Kubernetes container logs. Those sidecar
|
to stdout, thus effectively moving those log lines from files on the Pod to Kubernetes container logs. Those sidecar
|
||||||
container images can be minimal with just enough in the image to support the unix `tail` command (or similar Go binary,
|
container images can be minimal with just enough in the image to support the unix `tail` command (or similar Go binary,
|
||||||
such as [hpcloud/tail](https://github.com/hpcloud/tail)).
|
such as [hpcloud/tail](https://github.com/hpcloud/tail), although that particular example library may not be maintained
|
||||||
|
anymore).
|
||||||
|
|
||||||
Kubernetes will take care of concerns such as log rotation for the container logs. For the files on the Pod's disk
|
Kubernetes will take care of concerns such as log rotation for the container logs. For the files on the Pod's disk
|
||||||
output by the Supervisor and Concierge apps, we should research whether Pinniped should have code to avoid allowing
|
output by the Supervisor and Concierge apps, we should research whether Pinniped should have code to avoid allowing
|
||||||
@ -189,7 +192,7 @@ action type may be added. All keys should be included in documentation for the a
|
|||||||
|
|
||||||
Every event should include these keys:
|
Every event should include these keys:
|
||||||
|
|
||||||
- `time`: the timestamp of the event
|
- `timestamp`: the timestamp of the event
|
||||||
- `event`: the event type, which is a brief description of what happened, with no string interpolation, so it will
|
- `event`: the event type, which is a brief description of what happened, with no string interpolation, so it will
|
||||||
always be the same for a given event type (e.g. `upstream refresh succeeded`)
|
always be the same for a given event type (e.g. `upstream refresh succeeded`)
|
||||||
- `v`: a number specifying the format version of the event type, starting with `1`, to give us flexibility to make
|
- `v`: a number specifying the format version of the event type, starting with `1`, to give us flexibility to make
|
||||||
@ -198,7 +201,7 @@ Every event should include these keys:
|
|||||||
|
|
||||||
Depending on the event type, an event might include other keys, such as:
|
Depending on the event type, an event might include other keys, such as:
|
||||||
|
|
||||||
- `msg`: a freeform warning or error message meant to be read by a human (e.g. the error message that was returned by an
|
- `message`: a freeform warning or error message meant to be read by a human (e.g. the error message that was returned by an
|
||||||
upstream IDP during a failed login attempt)
|
upstream IDP during a failed login attempt)
|
||||||
- `requestID`: a unique ID for the request, if the event is related to an API request
|
- `requestID`: a unique ID for the request, if the event is related to an API request
|
||||||
- `requestURI`: the path of the endpoint, if the event is related to an API request
|
- `requestURI`: the path of the endpoint, if the event is related to an API request
|
||||||
@ -209,7 +212,8 @@ Depending on the event type, an event might include other keys, such as:
|
|||||||
there is one
|
there is one
|
||||||
|
|
||||||
The names of many of these keys are purposefully similar to the names of the keys used by Kubernetes audit events to
|
The names of many of these keys are purposefully similar to the names of the keys used by Kubernetes audit events to
|
||||||
make them feel familiar.
|
make them feel familiar. Also, where it makes sense, the key names should be similar to
|
||||||
|
[those used in the Pinniped Pod logs](https://github.com/vmware-tanzu/pinniped/blob/main/internal/plog/zap.go#L104-L120).
|
||||||
|
|
||||||
The details of these additional keys will be worked out as the details of the specific events are being worked out,
|
The details of these additional keys will be worked out as the details of the specific events are being worked out,
|
||||||
during implementation of this proposal.
|
during implementation of this proposal.
|
||||||
@ -233,7 +237,7 @@ It would be desirable for a timestamp to:
|
|||||||
2. Be easily parsable by log parsers, especially fluentbit
|
2. Be easily parsable by log parsers, especially fluentbit
|
||||||
3. Be expressed in UTC time
|
3. Be expressed in UTC time
|
||||||
4. Use at least millisecond precision
|
4. Use at least millisecond precision
|
||||||
5. Use the consistent JSON key name `time`
|
5. Use the consistent JSON key name `timestamp`
|
||||||
|
|
||||||
Golang's standard library's [interpretation](https://pkg.go.dev/time#pkg-constants) of RFC 3339 with nanosecond
|
Golang's standard library's [interpretation](https://pkg.go.dev/time#pkg-constants) of RFC 3339 with nanosecond
|
||||||
precision defines a timestamp format which meets the above goals. An example timestamp in this format, printed
|
precision defines a timestamp format which meets the above goals. An example timestamp in this format, printed
|
||||||
@ -247,7 +251,7 @@ Given this timestamp format, the following fluentbit configuration could be used
|
|||||||
[PARSER]
|
[PARSER]
|
||||||
Name json
|
Name json
|
||||||
Format json
|
Format json
|
||||||
Time_Key time
|
Time_Key timestamp
|
||||||
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
|
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -319,12 +323,13 @@ None yet.
|
|||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
- Should we output events that can function similar to access logs for the Supervisor endoints?
|
None.
|
||||||
- Should we try to somehow detect that a user is "root-like"?
|
|
||||||
|
|
||||||
## Answered Questions
|
## Answered Questions
|
||||||
|
|
||||||
None yet.
|
- Should we output events that can function similar to access logs for the Supervisor endpoints?
|
||||||
|
Yes (paragraphs above updated).
|
||||||
|
- Should we try to somehow detect that a user is "root-like"? Yes (paragraphs above updated).
|
||||||
|
|
||||||
## Implementation Plan
|
## Implementation Plan
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user