Implement basic liveness and readiness probes

- Call the auto-generated /healthz endpoint of our aggregated API server - Use http for liveness even though tcp seems like it might be more appropriate, because tcp probes cause TLS handshake errors to appear in our logs every few seconds - Use conservative timeouts and retries on the liveness probe to avoid having our container get restarted when it is temporarily slow due to running in an environment under resource pressure - Use less conservative timeouts and retries for the readiness probe to remove an unhealthy pod from the service less conservatively than restarting the container - Tuning the settings for retries and timeouts seem to be a mysterious art, so these are just a first draft
2020-08-17 16:44:42 -07:00 · 2020-08-17 16:44:42 -07:00 · ecde8fa8af
commit ecde8fa8af
parent 29654c39a5
1 changed files with 18 additions and 0 deletions
--- a/deploy/deployment.yaml
+++ b/deploy/deployment.yaml
@ -88,6 +88,24 @@ spec:
              mountPath: /etc/podinfo
            - name: k8s-certs
              mountPath: /etc/kubernetes/pki
+          livenessProbe:
+            httpGet:
+              path: /healthz
+              port: 443
+              scheme: HTTPS
+            initialDelaySeconds: 20
+            timeoutSeconds: 15
+            periodSeconds: 10
+            failureThreshold: 5
+          readinessProbe:
+            httpGet:
+              path: /healthz
+              port: 443
+              scheme: HTTPS
+            initialDelaySeconds: 20
+            timeoutSeconds: 3
+            periodSeconds: 10
+            failureThreshold: 3
      volumes:
        - name: config-volume
          configMap: