Scale Deployment to Zero

If a Deployment's Pods are seen crashing multiple times it usually indicates there is an issue that must be manually resolved. Removing the failing Pods and marking the Deployment is often a useful troubleshooting step. This policy watches existing Pods and if any are observed to have restarted more than once, indicating a potential crashloop, Kyverno scales its parent deployment to zero and writes an annotation signaling to an SRE team that troubleshooting is needed. It may be necessary to grant additional privileges to the Kyverno ServiceAccount, via one of the existing ClusterRoleBindings or a new one, so it can modify Deployments. This policy scales down deployments with frequently restarting pods by monitoring `Pod.status` for `restartCount`updates, which are performed by the kubelet. No `resourceFilter` modifications are needed if matching on `Pod`and `Pod.status`. Note: For this policy to work, you must modify Kyverno's ConfigMap to remove or change the line `excludeGroups: system:nodes` since version 1.10.

Policy Definition

/other/scale-deployment-zero/scale-deployment-zero.yaml

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: scale-deployment-zero
 5  annotations:
 6    policies.kyverno.io/title: Scale Deployment to Zero
 7    policies.kyverno.io/category: Other
 8    policies.kyverno.io/severity: medium
 9    policies.kyverno.io/subject: Deployment
10    kyverno.io/kyverno-version: 1.7.0
11    policies.kyverno.io/minversion: 1.7.0
12    kyverno.io/kubernetes-version: "1.23"
13    policies.kyverno.io/description: >-
14      If a Deployment's Pods are seen crashing multiple times it usually indicates
15      there is an issue that must be manually resolved. Removing the failing Pods and
16      marking the Deployment is often a useful troubleshooting step. This policy watches
17      existing Pods and if any are observed to have restarted more than
18      once, indicating a potential crashloop, Kyverno scales its parent deployment to zero
19      and writes an annotation signaling to an SRE team that troubleshooting is needed.
20      It may be necessary to grant additional privileges to the Kyverno ServiceAccount,
21      via one of the existing ClusterRoleBindings or a new one, so it can modify Deployments.
22      This policy scales down deployments with frequently restarting pods by monitoring `Pod.status` 
23      for `restartCount`updates, which are performed by the kubelet. No `resourceFilter` modifications
24      are needed if matching on `Pod`and `Pod.status`.
25      Note: For this policy to work, you must modify Kyverno's ConfigMap to remove or change the line 
26      `excludeGroups: system:nodes` since version 1.10.
27spec:
28  rules:
29  - name: annotate-deployment-rule
30    match:
31      any:
32      - resources:
33          kinds:
34          - v1/Pod.status
35    preconditions:
36      all:
37      - key: "{{request.operation || 'BACKGROUND'}}"
38        operator: Equals
39        value: UPDATE
40      - key: "{{ sum(request.object.status.containerStatuses[*].restartCount || [`0`]) }}"
41        operator: GreaterThan
42        value: 1
43    context:
44    - name: rsname
45      variable:
46        jmesPath: "request.object.metadata.ownerReferences[0].name"
47        default: ''
48    - name: deploymentname
49      apiCall:
50        urlPath: "/apis/apps/v1/namespaces/{{request.namespace}}/replicasets"
51        jmesPath: "items[?metadata.name=='{{rsname}}'].metadata.ownerReferences[0].name | [0]"
52    mutate:
53      targets:
54        - apiVersion: apps/v1
55          kind: Deployment
56          name: "{{deploymentname}}"
57          namespace: "{{request.namespace}}"
58      patchStrategicMerge:
59        metadata:
60          annotations:
61            sre.corp.org/troubleshooting-needed: "true"
62        spec:
63          replicas: 0

Create policy issue