From 973c3102bb550929ca77c00bfc0bc88bc373eba8 Mon Sep 17 00:00:00 2001 From: Ryan Richard Date: Thu, 21 Apr 2022 14:50:48 -0700 Subject: [PATCH 1/4] add audit logging proposal --- proposals/1141_audit-logging/README.md | 323 +++++++++++++++++++++++++ 1 file changed, 323 insertions(+) create mode 100644 proposals/1141_audit-logging/README.md diff --git a/proposals/1141_audit-logging/README.md b/proposals/1141_audit-logging/README.md new file mode 100644 index 00000000..d6ae0341 --- /dev/null +++ b/proposals/1141_audit-logging/README.md @@ -0,0 +1,323 @@ +--- +title: "Audit Logging" +authors: [ "@cfryanr" ] +status: "in-review" +sponsor: [ ] +approval_date: "" +--- + +*Disclaimer*: Proposals are point-in-time designs and decisions. Once approved and implemented, they become historical +documents. If you are reading an old proposal, please be aware that the features described herein might have continued +to evolve since. + +# Audit Logging + +## Problem Statement + +Audit logging is a requirement from most compliance standards (e.g. FedRAMP, PCI-DSS). The Pinniped Supervisor and +Concierge components should provide audit logs to help users meet these compliance requirements. + +The Kubernetes API server already supports +rich [audit logging features](https://kubernetes.io/docs/tasks/debug-application-cluster/audit/) which are implemented +by vendors of Kubernetes distributions. The Pinniped audit logs are meant to augment, not replace, the Kubernetes audit +logs. + +### How Pinniped Works Today (as of version v0.16.0) + +The Pinniped Supervisor and Concierge components are Kubernetes Deployments. Today, each Pod has a single container, +which is the Supervisor or Concierge app. Kubernetes captures the stdout of the app into the Pod logs. + +Today, the Pinniped Supervisor and Concierge log many interesting events to their Pod logs. These logs are meant +primarily to help an admin user debug problems with their Pinniped configuration or with their cluster. The Supervisor +and Concierge each offer an install-time configuration option to turn up the verbosity of these Pod logs. + +However, these logs are not meant to be audit logs. They generally focus on logging problems, not on logging successes. +They try to avoid logging anything that might be confidential or PII (personally identifiable information). Since email +addresses might be considered PII, these logs generally avoid including usernames at the default log level, since +usernames could be email addresses in some configurations. Logging the identity of actors (usernames) are a key aspect +of audit logs. + +## Terminology / Concepts + +None. + +## Proposal + +The goal of an audit log is to log events that could be helpful in a forensic investigation of past usage, including the +actor (the username) and the actions that were taken on the system. + +### Goals and Non-goals + +Goals + +- Auditing events relating to upstream identity provider (IDP) authentication, refresh, and sessions. +- Auditing events relating to minting and validating cluster credentials. +- Enabling auditors to easily stitch together authentication events into an audit trail. +- Provide consistent data across auditable events. +- Provide the ability to enable and disable auditing. +- Provide the ability to route audit logs to a separate destination from the rest of Pinniped’s logs. + +Non-goals + +- Enabling auditing in the impersonation proxy. If needed, this will be handled in a separate feature. +- Providing the ability to filter or choose which audit events to capture. +- Auditing the management of CRs (e.g. OIDCIdentityProvider). These events are captured by the API server audit logs. + +### Specification / How it Solves the Use Cases + +This proposal recommends following the recommendation of the Kubernetes docs to create a separate Pod container log. +This new container log will contain the audit logs (and only the audit logs). + +#### API Changes + +##### Configuration Options + +There will be very few user-facing configuration options for audit logging in the first version of the feature. If later +found to be needed, more configuration could be added in future versions. + +This proposal recommends adding a single on/off install-time configuration option for disabling audit logs. By default, +audit logs will be enabled. An admin user who is concerned about logging identities, for example because usernames may +be considered PII, may disable audit logging. + +Like other install-time configuration options, this option would appear in the values.yaml file of the Supervisor and +Concierge deployment directories. The selected value would be rendered into the "static" ConfigMap, and read by the +Supervisor or Concierge app's Golang code at Pod startup time. + +##### Event Data + +Deciding every specific audit event is an implementation detail beyond the scope of this proposal. + +Generally, the following data should be included with every audit event, whenever possible: + +- What type of event occurred (e.g. login) +- Outcomes of event (succeed or fail) +- When the event occurred +- Where the event occurred (Kubernetes Pod logs automatically include the ID of the Pod, which should be sufficient) +- Source of the event (e.g. requester IP address) +- The identity of individuals or subjects associated with the event (who initiated, who participated. etc.) +- Details involving any objects accessed + +The Supervisor's audit logs would include events such as: + +- Upstream logins for all IdP types (started, succeeded, failed) +- Upstream refresh for all IdP types (succeeded, failed) +- Upstream group refresh for all IdP types (succeeded, failed) +- Downstream login (started, succeeded, failed) +- Downstream token exchange (succeeded, failed) +- Session expired +- Maybe: The equivalent of access log events for all Supervisor endpoints, since there is no other component providing + access logs. This would include logging things like calls to the Supervisor's OIDC well-known discovery endpoint. + These logs could help an investigator determine more about the usage pattern of a suspicious client. +- Maybe: Newly authenticated user is associated with “admin” RBAC. Note that the Supervisor is not directly aware of + RBAC, so determining this would require otherwise unnecessary calls to the Kubernetes API server, which would degrade + the performance of the Supervisor. It's also not clear what would constitute "admin" level access, since RBAC is + configurable at a very fine-grained level. On the other hand, the Supervisor is directly aware of the user's group + memberships, which could be logged. + +The Concierge's audit logs would include events such as: + +- Token credential request (succeeded, failed, maybe maps to admin RBAC). While already captured by the API server audit + logs, those should likely be set to metadata. Duplicating the event allows for more controlled capture & management of + data. +- WhoAmI Request. While already captured by the API server audit logs, duplicating the event allows for more controlled + capture & management of data. + +Other events may be useful to auditors and may be included in the audit logs, such as: + +- Application startup with version information +- Graceful application shutdown + +##### Audit Logs as Separate Log Files + +The Concierge and Supervisor apps could each send audit logs to separate files on disk in JSON format. The performance +impact of logging to a file should be acceptable thanks to file buffering, but this assumption should be tested. Note +that this approach would not guarantee that the log statement is flushed to the file before the action is performed, +because then we would lose the benefit of buffering. It would be "best effort" to the file, e.g. the process crashing +might lose a few lines of logs. A normal pod shutdown should be able to flush the file without any loss. + +[A new streaming sidecar container](https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-logging-agent) +will be added to both the Concierge and Supervisor apps Deployments' Pods. These containers will tail those audit logs +to stdout, thus effectively moving those log lines from files on the Pod to Kubernetes container logs. Those sidecar +container images can be minimal with just enough in the image to support the unix `tail` command (or similar Go binary, +such as [hpcloud/tail](https://github.com/hpcloud/tail)). + +Kubernetes will take care of concerns such as log rotation for the container logs. For the files on the Pod's disk +output by the Supervisor and Concierge apps, we should research whether Pinniped should have code to avoid allowing +those files from growing too large. Old lines can be discarded since the sidecar container should have already streamed +them. + +Container logs in JSON format are easy for node-level logging agents, e.g. fluentbit, to ingest/annotate/parse/filter +and send to numerous sink destinations. These containers could still run when audit logs are disabled by the admin, but +would produce no log lines in that case. + +##### Parsing, Filtering, and Sending Audit Logs to an External Destination + +Many users will use the popular [fluentbit](https://fluentbit.io) project to filter and extract Pod logs from their +cluster. This project implements +a [node-level log agent](https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent) +which understands the Kubernetes directory and file layout for Pod logs. It also has a feature to further enrich the +logs +by [automatically adding more information about the source Pod](https://docs.fluentbit.io/manual/pipeline/filters/kubernetes) +to each event (line) in the log. It supports many configurable options +for [parsing](https://docs.fluentbit.io/manual/pipeline/parsers), +[filtering](https://docs.fluentbit.io/manual/pipeline/filters), and sending logs +to [many destinations](https://docs.fluentbit.io/manual/pipeline/outputs). + +By putting the Supervisor and Concierge audit logs into their own Pod logs, Pinniped will be compatible with any +existing node-level agent software which can extract logs from a Kubernetes cluster. This allows the Pinniped code to +focus on generating the logs as JSON, without worrying about providing any configuration options for filtering or +sending to various destinations. + +##### Audit Log JSON Format + +Each line of audit log will represent an event. Each line will be a complete JSON object, +i.e. `{"key1":"value1","key2":"value2"}`. + +Some, but not all, events will be the result of a user making an API request to an endpoint. One API request from a user +may cause more than one event to be logged. If possible, unique ID will be determined for each incoming request, and +will be included in all events caused by that request. + +Where possible, the top-level keys of the JSON object will use standardized names. Other top-level keys specific to that +action type may be added. All keys should be included in documentation for the audit log feature. + +Every event should include these keys: + +- `time`: the timestamp of the event +- `event`: the event type, which is a brief description of what happened, with no string interpolation, so it will + always be the same for a given event type (e.g. `upstream refresh succeeded`) +- `v`: a number specifying the format version of the event type, starting with `1`, to give us flexibility to make + breaking changes to the format of an event type in future releases (e.g. change the name of the JSON keys, or change + the data type of the value of an existing key) + +Depending on the event type, an event might include other keys, such as: + +- `msg`: a freeform warning or error message meant to be read by a human (e.g. the error message that was returned by an + upstream IDP during a failed login attempt) +- `requestID`: a unique ID for the request, if the event is related to an API request +- `requestPath`: the path of the endpoint, if the event is related to an API request +- `requestorIP`: the client's IP, if the event is related to an API request +- `user`: the username of the user performing the action, if there is one +- `groups`: the group memberships of the user performing the action, if the action is related determining or changing + their group memberships + +The details of these additional keys will be worked out as the details of the specific events are being worked out, +during implementation of this proposal. + +##### Audit Log Timestamps + +The date format used in the audit logs should be something which can be easily parsed by fluentbit, to make it easy for +users to configure fluentbit. We could easily document this to provide instructions on how to configure a custom +fluentbit parser for Pinniped audit logs. We should probably +avoid [fluentbit's default json parser's](https://github.com/fluent/fluent-bit/blob/845b6ae8576077fd512dbe64fb8e16ff4b15abdb/conf/parsers.conf#L35-L39) +date format, which assumes dates will be in an ugly format and also lacks sub-second precision +(e.g. `08/Apr/2022:19:24:01 +0000`). + +fluentbit uses [strptime](https://linux.die.net/man/3/strptime) +with [an extension for fractional seconds](https://docs.fluentbit.io/manual/pipeline/parsers/configuring-parser#time-resolution-and-fractional-seconds) +to parse timestamps. + +It would be desirable for a timestamp to: + +1. Be human-readable (e.g. not seconds since an epoch) +2. Be easily parsable by log parsers, especially fluentbit +3. Be expressed in UTC time +4. Use at least millisecond precision +5. Use the consistent JSON key name `time` + +[Syslog's RFC 5424](https://datatracker.ietf.org/doc/html/rfc5424#section-6.2.3) defines a timestamp format which meets +the above goals. An example timestamp in this format is `2003-10-11T22:14:15.003` which is represents UTC time on +October 11, 2003 at 10:14:15 pm, 3 milliseconds into the next second. + +Given this timestamp format, the following fluentbit configuration could be used to parse Pinniped's audit logs. + +``` + [PARSER] + Name json + Format json + Time_Key time + Time_Format %Y-%m-%dT%H:%M:%S.%L +``` + +#### Upgrades + +Since audit logs will be output to a new location, there are not any backward compatibility concerns for them in the +first release. + +Adding a second container to the Pods in generally not noticeable by a user, but may have some impact on existing +installations in some rare cases, so it should be explained in the release notes. For example, a GKE Ingress will, by +default, read the Pod's container definition to try to guess the health check endpoint for the backend Service of the +Ingress. When there is only one container, it will try to guess, but where there is more than one container it will give +up on guessing and instead expect the user to configure the health checks. So upgrading could break the health checks of +a GKE Ingress, if no health checks were configured. + +#### Tests + +Audit logging will be a user-facing feature, and the format of the logs should be considered a kind of documented API. +Unnecessary changes to the format should be avoided after the first release. Therefore, all audit log events should be +covered by unit tests. + +This implies that it may be desirable for the implementation to involve passing around a pointer to some interface to +all code which needs to add events to the audit log. Such an implementation would make the audit logs more testable. A +production code implementation of the interface should take care of common concerns, such as adding the timestamp, +deciding required key names, and formatting the output as JSON. A test implementation of the interface could handle +those common concerns differently to make testing easier. + +#### New Dependencies + +- We might want to consider using a library like [zap](https://github.com/uber-go/zap) to aid in implementation, but + that is already an indirect dependency of Pinniped. +- The new streaming sidecar container will need a container image. Using the existing pinniped-server container image + seems desirable. It is a distroless image, which is good for security. And it is the only image that we currently ship + in Pinniped releases. One option to make this happen would be to implement the tail command in Go, but any binary that + can work in a distroless image should be okay. We should avoid adding linux standard libraries to the container image, + so the binary should be statically linked with no external dependencies. The binary should support the same OS and + architecture that our existing Go binary supports. + +#### Performance Considerations + +By using buffered output to write to the audit log files, there should not be any meaningful performance impact. This +assumption should be tested. + +#### Observability Considerations + +Auditing will improve operator observability, as described in the other sections of this document. + +#### Security Considerations + +The audit logs will be Pod container logs, so the contents of the logs will be protected by Kubernetes like any Pod +container logs. + +#### Usability Considerations + +By using Pod container logs, the user will have many options to manage these logs. + +#### Documentation Considerations + +The supported audit event types, and they JSON keys output for each event type, should be documented. Users should be +able to build their own parsers for these events based on the documentation. + +If the production code implementation of the audit interface used Golang constants for all allowed JSON key names and +event type names, and otherwise enforced certain standards, then it may be possible to auto-generate (or nearly +auto-generate) the documentation for the audit event types. + +### Other Approaches Considered + +None yet. + +## Open Questions + +- Should we output events that can function similar to access logs for the Supervisor endoints? +- Should we try to somehow detect that a user is "root-like"? + +## Answered Questions + +None yet. + +## Implementation Plan + +The maintainers will implement these features. It might fit into one PR. + +## Implementation PRs + +*This section is a placeholder to list the PRs that implement this proposal. This section should be left empty until +after the proposal is approved. After implementation, the proposal can be updated to list related implementation PRs.* From dfbc33b933cdb5e43982c821fe5fbf65192b95be Mon Sep 17 00:00:00 2001 From: Ryan Richard Date: Mon, 2 May 2022 09:47:09 -0700 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: Mo Khan --- proposals/1141_audit-logging/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/1141_audit-logging/README.md b/proposals/1141_audit-logging/README.md index d6ae0341..bc64a476 100644 --- a/proposals/1141_audit-logging/README.md +++ b/proposals/1141_audit-logging/README.md @@ -25,7 +25,7 @@ logs. ### How Pinniped Works Today (as of version v0.16.0) The Pinniped Supervisor and Concierge components are Kubernetes Deployments. Today, each Pod has a single container, -which is the Supervisor or Concierge app. Kubernetes captures the stdout of the app into the Pod logs. +which is the Supervisor or Concierge app. Kubernetes captures the stdout and stderr of the app into the Pod logs. Today, the Pinniped Supervisor and Concierge log many interesting events to their Pod logs. These logs are meant primarily to help an admin user debug problems with their Pinniped configuration or with their cluster. The Supervisor @@ -59,7 +59,7 @@ Goals Non-goals -- Enabling auditing in the impersonation proxy. If needed, this will be handled in a separate feature. +- Enabling Kubernetes API request auditing in the impersonation proxy. If needed, this will be handled in a separate feature. - Providing the ability to filter or choose which audit events to capture. - Auditing the management of CRs (e.g. OIDCIdentityProvider). These events are captured by the API server audit logs. @@ -252,7 +252,7 @@ a GKE Ingress, if no health checks were configured. #### Tests -Audit logging will be a user-facing feature, and the format of the logs should be considered a kind of documented API. +Audit logging will be a user-facing feature, and the format of the logs should be considered a documented and versioned API. Unnecessary changes to the format should be avoided after the first release. Therefore, all audit log events should be covered by unit tests. From 831abc315ee10fccf099fb68023e49342dbb2cf0 Mon Sep 17 00:00:00 2001 From: Ryan Richard Date: Mon, 9 May 2022 14:45:18 -0700 Subject: [PATCH 3/4] Update audit log proposal key names and timestamp format --- proposals/1141_audit-logging/README.md | 39 +++++++++++++++++--------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/proposals/1141_audit-logging/README.md b/proposals/1141_audit-logging/README.md index bc64a476..4ebe9cc9 100644 --- a/proposals/1141_audit-logging/README.md +++ b/proposals/1141_audit-logging/README.md @@ -59,7 +59,8 @@ Goals Non-goals -- Enabling Kubernetes API request auditing in the impersonation proxy. If needed, this will be handled in a separate feature. +- Enabling Kubernetes API request auditing in the impersonation proxy. If needed, this will be handled in a separate + feature. - Providing the ability to filter or choose which audit events to capture. - Auditing the management of CRs (e.g. OIDCIdentityProvider). These events are captured by the API server audit logs. @@ -170,6 +171,12 @@ sending to various destinations. ##### Audit Log JSON Format +The +[format of Kubernetes audit logs](https://github.com/kubernetes/kubernetes/blob/d0832102a7017e83bf47a5137b690e52f19c267c/staging/src/k8s.io/apiserver/pkg/apis/audit/v1/types.go#L72-L142) +is not a perfect fit for Pinniped. The Kubernetes audit logs are strongly oriented towards API requests for Kubernetes +resources, with many of the fields representing the details of a request and response. The format of the Pinniped audit +logs will draw inspiration from the Kubernetes audit events without trying to directly copy them. + Each line of audit log will represent an event. Each line will be a complete JSON object, i.e. `{"key1":"value1","key2":"value2"}`. @@ -194,11 +201,15 @@ Depending on the event type, an event might include other keys, such as: - `msg`: a freeform warning or error message meant to be read by a human (e.g. the error message that was returned by an upstream IDP during a failed login attempt) - `requestID`: a unique ID for the request, if the event is related to an API request -- `requestPath`: the path of the endpoint, if the event is related to an API request -- `requestorIP`: the client's IP, if the event is related to an API request -- `user`: the username of the user performing the action, if there is one -- `groups`: the group memberships of the user performing the action, if the action is related determining or changing - their group memberships +- `requestURI`: the path of the endpoint, if the event is related to an API request +- `verb`: the REST method called on the endpoint, if the event is related to an API request +- `sourceIPs`: the client's IPs, if the event is related to an API request +- `userAgent`: the user agent string reported by the client, if the event is related to an API request +- `user`: a nested structure which can include the `username`, `groups`, and `uid` of the user performing the action, if + there is one + +The names of many of these keys are purposefully similar to the names of the keys used by Kubernetes audit events to +make them feel familiar. The details of these additional keys will be worked out as the details of the specific events are being worked out, during implementation of this proposal. @@ -224,9 +235,11 @@ It would be desirable for a timestamp to: 4. Use at least millisecond precision 5. Use the consistent JSON key name `time` -[Syslog's RFC 5424](https://datatracker.ietf.org/doc/html/rfc5424#section-6.2.3) defines a timestamp format which meets -the above goals. An example timestamp in this format is `2003-10-11T22:14:15.003` which is represents UTC time on -October 11, 2003 at 10:14:15 pm, 3 milliseconds into the next second. +Golang's standard library's [interpretation](https://pkg.go.dev/time#pkg-constants) of RFC 3339 with nanosecond +precision defines a timestamp format which meets the above goals. An example timestamp in this format, printed +by `fmt.Println(time.Now().UTC().Format(time.RFC3339Nano))`, is `2022-05-09T21:32:59.811913Z`, which represents UTC time +on May 9, 2022, at 21:32:59 pm, 811913 nanoseconds into the next second. Note that trailing zeros on the nanoseconds are +dropped, so the length of the nanoseconds field is variable in the output. Given this timestamp format, the following fluentbit configuration could be used to parse Pinniped's audit logs. @@ -235,7 +248,7 @@ Given this timestamp format, the following fluentbit configuration could be used Name json Format json Time_Key time - Time_Format %Y-%m-%dT%H:%M:%S.%L + Time_Format %Y-%m-%dT%H:%M:%S.%LZ ``` #### Upgrades @@ -252,9 +265,9 @@ a GKE Ingress, if no health checks were configured. #### Tests -Audit logging will be a user-facing feature, and the format of the logs should be considered a documented and versioned API. -Unnecessary changes to the format should be avoided after the first release. Therefore, all audit log events should be -covered by unit tests. +Audit logging will be a user-facing feature, and the format of the logs should be considered a documented and versioned +API. Unnecessary changes to the format should be avoided after the first release. Therefore, all audit log events should +be covered by unit tests. This implies that it may be desirable for the implementation to involve passing around a pointer to some interface to all code which needs to add events to the audit log. Such an implementation would make the audit logs more testable. A From 3cf3b28c5b0a24b5f9adf08fb40cb0716e42034e Mon Sep 17 00:00:00 2001 From: Ryan Richard Date: Wed, 22 Jun 2022 15:12:28 -0700 Subject: [PATCH 4/4] Update audit log proposal --- proposals/1141_audit-logging/README.md | 41 +++++++++++++++----------- 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/proposals/1141_audit-logging/README.md b/proposals/1141_audit-logging/README.md index 4ebe9cc9..994b996c 100644 --- a/proposals/1141_audit-logging/README.md +++ b/proposals/1141_audit-logging/README.md @@ -77,8 +77,7 @@ There will be very few user-facing configuration options for audit logging in th found to be needed, more configuration could be added in future versions. This proposal recommends adding a single on/off install-time configuration option for disabling audit logs. By default, -audit logs will be enabled. An admin user who is concerned about logging identities, for example because usernames may -be considered PII, may disable audit logging. +audit logs will be disabled. Usernames may be considered PII, so disabled by default avoids potentially logging PII. Like other install-time configuration options, this option would appear in the values.yaml file of the Supervisor and Concierge deployment directories. The selected value would be rendered into the "static" ConfigMap, and read by the @@ -106,20 +105,23 @@ The Supervisor's audit logs would include events such as: - Downstream login (started, succeeded, failed) - Downstream token exchange (succeeded, failed) - Session expired -- Maybe: The equivalent of access log events for all Supervisor endpoints, since there is no other component providing +- The equivalent of access log events for all Supervisor endpoints, since there is no other component providing access logs. This would include logging things like calls to the Supervisor's OIDC well-known discovery endpoint. These logs could help an investigator determine more about the usage pattern of a suspicious client. -- Maybe: Newly authenticated user is associated with “admin” RBAC. Note that the Supervisor is not directly aware of - RBAC, so determining this would require otherwise unnecessary calls to the Kubernetes API server, which would degrade - the performance of the Supervisor. It's also not clear what would constitute "admin" level access, since RBAC is - configurable at a very fine-grained level. On the other hand, the Supervisor is directly aware of the user's group - memberships, which could be logged. +- The identity (username, group memberships) of newly authenticated users +- Newly authenticated user is associated with “admin-like” RBAC. Any user that is allowed to perform + `verbs=* groups=* resources=*` according to a subject access review API call shall be considered "admin-like". + This would only indicate that the user has "admin-like" permissions on the Supervisor cluster itself, not on other + workload clusters, since the Supervisor is not aware of the RBAC settings on the workload clusters. The Concierge's audit logs would include events such as: -- Token credential request (succeeded, failed, maybe maps to admin RBAC). While already captured by the API server audit +- Token credential request (succeeded, failed, maps to admin RBAC). While already captured by the API server audit logs, those should likely be set to metadata. Duplicating the event allows for more controlled capture & management of data. + - Similar to the Supervisor, the TCR endpoint could log when an authenticated user is associated with “admin-like” + RBAC. Any user that is allowed to perform `verbs=* groups=* resources=*` according to a subject access review API + call shall be considered "admin-like". - WhoAmI Request. While already captured by the API server audit logs, duplicating the event allows for more controlled capture & management of data. @@ -140,7 +142,8 @@ might lose a few lines of logs. A normal pod shutdown should be able to flush th will be added to both the Concierge and Supervisor apps Deployments' Pods. These containers will tail those audit logs to stdout, thus effectively moving those log lines from files on the Pod to Kubernetes container logs. Those sidecar container images can be minimal with just enough in the image to support the unix `tail` command (or similar Go binary, -such as [hpcloud/tail](https://github.com/hpcloud/tail)). +such as [hpcloud/tail](https://github.com/hpcloud/tail), although that particular example library may not be maintained +anymore). Kubernetes will take care of concerns such as log rotation for the container logs. For the files on the Pod's disk output by the Supervisor and Concierge apps, we should research whether Pinniped should have code to avoid allowing @@ -189,7 +192,7 @@ action type may be added. All keys should be included in documentation for the a Every event should include these keys: -- `time`: the timestamp of the event +- `timestamp`: the timestamp of the event - `event`: the event type, which is a brief description of what happened, with no string interpolation, so it will always be the same for a given event type (e.g. `upstream refresh succeeded`) - `v`: a number specifying the format version of the event type, starting with `1`, to give us flexibility to make @@ -198,7 +201,7 @@ Every event should include these keys: Depending on the event type, an event might include other keys, such as: -- `msg`: a freeform warning or error message meant to be read by a human (e.g. the error message that was returned by an +- `message`: a freeform warning or error message meant to be read by a human (e.g. the error message that was returned by an upstream IDP during a failed login attempt) - `requestID`: a unique ID for the request, if the event is related to an API request - `requestURI`: the path of the endpoint, if the event is related to an API request @@ -209,7 +212,8 @@ Depending on the event type, an event might include other keys, such as: there is one The names of many of these keys are purposefully similar to the names of the keys used by Kubernetes audit events to -make them feel familiar. +make them feel familiar. Also, where it makes sense, the key names should be similar to +[those used in the Pinniped Pod logs](https://github.com/vmware-tanzu/pinniped/blob/main/internal/plog/zap.go#L104-L120). The details of these additional keys will be worked out as the details of the specific events are being worked out, during implementation of this proposal. @@ -233,7 +237,7 @@ It would be desirable for a timestamp to: 2. Be easily parsable by log parsers, especially fluentbit 3. Be expressed in UTC time 4. Use at least millisecond precision -5. Use the consistent JSON key name `time` +5. Use the consistent JSON key name `timestamp` Golang's standard library's [interpretation](https://pkg.go.dev/time#pkg-constants) of RFC 3339 with nanosecond precision defines a timestamp format which meets the above goals. An example timestamp in this format, printed @@ -247,7 +251,7 @@ Given this timestamp format, the following fluentbit configuration could be used [PARSER] Name json Format json - Time_Key time + Time_Key timestamp Time_Format %Y-%m-%dT%H:%M:%S.%LZ ``` @@ -319,12 +323,13 @@ None yet. ## Open Questions -- Should we output events that can function similar to access logs for the Supervisor endoints? -- Should we try to somehow detect that a user is "root-like"? +None. ## Answered Questions -None yet. +- Should we output events that can function similar to access logs for the Supervisor endpoints? + Yes (paragraphs above updated). +- Should we try to somehow detect that a user is "root-like"? Yes (paragraphs above updated). ## Implementation Plan