Implement basic liveness and readiness probes

- Call the auto-generated /healthz endpoint of our aggregated API server
- Use http for liveness even though tcp seems like it might be
  more appropriate, because tcp probes cause TLS handshake errors
  to appear in our logs every few seconds
- Use conservative timeouts and retries on the liveness probe to avoid
  having our container get restarted when it is temporarily slow due
  to running in an environment under resource pressure
- Use less conservative timeouts and retries for the readiness probe
  to remove an unhealthy pod from the service less conservatively than
  restarting the container
- Tuning the settings for retries and timeouts seem to be a mysterious
  art, so these are just a first draft
This commit is contained in:
Ryan Richard 2020-08-17 16:44:42 -07:00
parent 29654c39a5
commit ecde8fa8af

View File

@ -88,6 +88,24 @@ spec:
mountPath: /etc/podinfo
- name: k8s-certs
mountPath: /etc/kubernetes/pki
livenessProbe:
httpGet:
path: /healthz
port: 443
scheme: HTTPS
initialDelaySeconds: 20
timeoutSeconds: 15
periodSeconds: 10
failureThreshold: 5
readinessProbe:
httpGet:
path: /healthz
port: 443
scheme: HTTPS
initialDelaySeconds: 20
timeoutSeconds: 3
periodSeconds: 10
failureThreshold: 3
volumes:
- name: config-volume
configMap: