I was recently tasked with upgrading a Tanzu Kubernetes Cluster to v1.26.5 (version v1.26.5+vmware.2-fips.1 to be specific). Once I completed the upgrade, to my surprise, when I did a “kubectl get pods“, I found them to be missing. When i went digging for any known issues, I noticed something related to Pod Security that I had overlooked in the release notes.
Now, what does this mean and what will this do to my cluster workloads is what I attempt to clarify via this blog.
Pod Security Admission Controller
Pod Security Admission Controller or PSA is a replacement to the previously infamous Pod Security Policies (PSPs). The in-built PSA controller within a Kubernetes cluster helps implement Pod Security Standards, which are broadly categorised into the following 3 categories:
- Privileged – Complete permissions with no restrictions in place (implies privilege escalation)
- Baseline – Works with default pod configuration (restricts privilege escalation)
- Restricted – Fully restricted policy (follows Pod hardening settings)
To read further about pod security standards, you can use the upstream documentation. These pod security standards are applied/enforced to clusters via namespaces or via a cluster-wide configuration of the admission controller. Before we discuss about the how, we will look at the various enforcement modes available.
- enforce – Based on the security standard enforced, if there’s a violation pod will be rejected.
- audit – Even with violations, workloads aren’t impacted. But an audit event will be generated.
- warn – Even with violations, workloads aren’t impacted. But an user-facing warning is generated.
Enforcing Pod Security via Namespace Labels
To enforce the least privilege (Restricted) to users, nothing needs to be done, as this is the default setting with TKR 1.26 based clusters.
To enforce Baseline standards, admins can label the namespace with the following command:
kubectl label --overwrite ns NAMESPACE pod-security.kubernetes.io/enforce=baseline
Doing so would allow them to run their pods with minimal security restrictions and with enforce=privileged they can run without any security restrictions, which might not be recommended in case of a heavily scrutinised environment.
Note: You can also do a “kubectl label ns –all pod-security.kubernetes.io/enforce=baseline” to apply this setting to all namespaces within the cluster.
Enforcing Pod Security via Cluster-wide Configuration
To enforce this across the cluster, we will have to edit the kube-apiserver configuration. SSH into the master node of your Guest cluster using the procedure mentioned here. To retrieve the login password, you can use the procedure mentioned here.
Once you are logged in to your master node, execute the following command to identify which file is being used for the pod security admission controller configuration.
cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep -i admission-control-config
cat /etc/kubernetes/extra-config/admission-control-config.yaml
You can update the defaults section to enforce: “baseline” or enforce: “privileged” based on your needs and restart the kube-apiserver.
Note: kube-apiserver runs as a static pod on your control plane nodes. Hence this change needs to be applied across all the control plane nodes that form your cluster deployment. Using the “kubectl label ns –all” is an easier option in my opinion 😀
Upgrade Scenario
Coming back to the case of upgrades, I upgraded my cluster from 1.25.7 to 1.26.5 without doing any of the aforementioned configuration changes. And here’s what happened.
Prior to upgrade
As you can see from the output, with v1.25.7, when I create a pod that violates the security standard, it throws me a warning, but still allows me to create the pod.
As you can see from the “k get nodes” output, the nodes were running v1.25.7. I had created two namespaces “ns1” and “ns2” for the purposes of demonstration. One of the two namespaces is labelled, as you can see from the “k get ns ns2 –show-labels” output. I have set it to enforce=privileged, which means no workloads in the namespace would be rejected.
After upgrade
And there you go, pod in ns1 goes missing while the one in ns2 got recreated. Let’s explore what’s wrong with the pod in ns1.
If you look at the screenshot, you would notice that the deployment in ns1 is in a not ready state. Let us go ahead and describe the replica set associated and find out what the problem is.
There it is. The pod creation failed because it violates PodSecurity “restricted:latest”. Since there is no label on the namespace ns1, it uses the default PSA which is “enforce=restricted” and prevents the pod from coming up since it doesn’t comply with pod security standards.
Also, a point to note here is that the pod security is applied only to the pod objects and not the workload object (ex: deployment/job) that is used to create the pod.
Workaround
On the namespace, where the replica set controller is complaining about violation, apply the name space label post migration.
kubectl label --overwrite ns ns1 pod-security.kubernetes.io/enforce=baseline
Once the label is applied, our pod should be running after the replica set controller initiates the reconciliation. That’s it folks. Good luck with your upgrades!
Happy learning!