Pod Security Standards

A detailed look at the different policy levels defined in the Pod Security Standards.

The Pod Security Standards define three different policies to broadly cover the security spectrum. These policies are cumulative and range from highly-permissive to highly-restrictive. This guide outlines the requirements of each policy.

Profile Description
Privileged Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
Baseline Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
Restricted Heavily restricted policy, following current Pod hardening best practices.

Profile Details

Privileged

The Privileged policy is purposely-open, and entirely unrestricted. This type of policy is typically aimed at system- and infrastructure-level workloads managed by privileged, trusted users.

The Privileged policy is defined by an absence of restrictions. For allow-by-default enforcement mechanisms (such as gatekeeper), the Privileged policy may be an absence of applied constraints rather than an instantiated profile. In contrast, for a deny-by-default mechanism (such as Pod Security Policy) the Privileged policy should enable all controls (disable all restrictions).

Baseline

The Baseline policy is aimed at ease of adoption for common containerized workloads while preventing known privilege escalations. This policy is targeted at application operators and developers of non-critical applications. The following listed controls should be enforced/disallowed:

Note: In this table, wildcards (*) indicate all elements in a list. For example, spec.containers[*].securityContext refers to the Security Context object for all defined containers. If any of the listed containers fails to meet the requirements, the entire pod will fail validation.
Baseline policy specification
Control Policy
Host Namespaces

Sharing the host namespaces must be disallowed.

Restricted Fields

  • spec.hostNetwork
  • spec.hostPID
  • spec.hostIPC

Allowed Values

  • Undefined/nil
  • false
Privileged Containers

Privileged Pods disable most security mechanisms and must be disallowed.

Restricted Fields

  • spec.containers[*].securityContext.privileged
  • spec.initContainers[*].securityContext.privileged
  • spec.ephemeralContainers[*].securityContext.privileged

Allowed Values

  • Undefined/nil
  • false
Capabilities

Adding additional capabilities beyond those listed below must be disallowed.

Restricted Fields

  • spec.containers[*].securityContext.capabilities.add
  • spec.initContainers[*].securityContext.capabilities.add
  • spec.ephemeralContainers[*].securityContext.capabilities.add

Allowed Values

  • Undefined/nil
  • AUDIT_WRITE
  • CHOWN
  • DAC_OVERRIDE
  • FOWNER
  • FSETID
  • KILL
  • MKNOD
  • NET_BIND_SERVICE
  • SETFCAP
  • SETGID
  • SETPCAP
  • SETUID
  • SYS_CHROOT
HostPath Volumes

HostPath volumes must be forbidden.

Restricted Fields

  • spec.volumes[*].hostPath

Allowed Values

  • Undefined/nil
Host Ports

HostPorts should be disallowed, or at minimum restricted to a known list.

Restricted Fields

  • spec.containers[*].ports[*].hostPort
  • spec.initContainers[*].ports[*].hostPort
  • spec.ephemeralContainers[*].ports[*].hostPort

Allowed Values

  • Undefined/nil
  • Known list
  • 0
AppArmor

On supported hosts, the runtime/default AppArmor profile is applied by default. The baseline policy should prevent overriding or disabling the default AppArmor profile, or restrict overrides to an allowed set of profiles.

Restricted Fields

  • metadata.annotations["container.apparmor.security.beta.kubernetes.io/*"]

Allowed Values

  • Undefined/nil
  • runtime/default
  • localhost/*
SELinux

Setting the SELinux type is restricted, and setting a custom SELinux user or role option is forbidden.

Restricted Fields

  • spec.securityContext.seLinuxOptions.type
  • spec.containers[*].securityContext.seLinuxOptions.type
  • spec.initContainers[*].securityContext.seLinuxOptions.type
  • spec.ephemeralContainers[*].securityContext.seLinuxOptions.type

Allowed Values

  • Undefined/""
  • container_t
  • container_init_t
  • container_kvm_t

Restricted Fields

  • spec.securityContext.seLinuxOptions.user
  • spec.containers[*].securityContext.seLinuxOptions.user
  • spec.initContainers[*].securityContext.seLinuxOptions.user
  • spec.ephemeralContainers[*].securityContext.seLinuxOptions.user
  • spec.securityContext.seLinuxOptions.role
  • spec.containers[*].securityContext.seLinuxOptions.role
  • spec.initContainers[*].securityContext.seLinuxOptions.role
  • spec.ephemeralContainers[*].securityContext.seLinuxOptions.role

Allowed Values

  • Undefined/""
/proc Mount Type

The default /proc masks are set up to reduce attack surface, and should be required.

Restricted Fields

  • spec.containers[*].securityContext.procMount
  • spec.initContainers[*].securityContext.procMount
  • spec.ephemeralContainers[*].securityContext.procMount

Allowed Values

  • Undefined/nil
  • Default
Seccomp

Seccomp profile must not be explicitly set to Unconfined.

Restricted Fields

  • spec.securityContext.seccompProfile.type
  • spec.containers[*].securityContext.seccompProfile.type
  • spec.initContainers[*].securityContext.seccompProfile.type
  • spec.ephemeralContainers[*].securityContext.seccompProfile.type

Allowed Values

  • Undefined/nil
  • RuntimeDefault
  • Localhost
Sysctls

Sysctls can disable security mechanisms or affect all containers on a host, and should be disallowed except for an allowed "safe" subset. A sysctl is considered safe if it is namespaced in the container or the Pod, and it is isolated from other Pods or processes on the same Node.

Restricted Fields

  • spec.securityContext.sysctls[*].name

Allowed Values

  • Undefined/nil
  • kernel.shm_rmid_forced
  • net.ipv4.ip_local_port_range
  • net.ipv4.ip_unprivileged_port_start
  • net.ipv4.tcp_syncookies
  • net.ipv4.ping_group_range

Restricted

The Restricted policy is aimed at enforcing current Pod hardening best practices, at the expense of some compatibility. It is targeted at operators and developers of security-critical applications, as well as lower-trust users. The following listed controls should be enforced/disallowed:

Note: In this table, wildcards (*) indicate all elements in a list. For example, spec.containers[*].securityContext refers to the Security Context object for all defined containers. If any of the listed containers fails to meet the requirements, the entire pod will fail validation.
Restricted policy specification
Control Policy
Everything from the baseline profile.
Volume Types

In addition to restricting HostPath volumes, the restricted policy limits usage of non-core volume types to those defined through PersistentVolumes.

Restricted Fields

  • spec.volumes[*].hostPath
  • spec.volumes[*].gcePersistentDisk
  • spec.volumes[*].awsElasticBlockStore
  • spec.volumes[*].gitRepo
  • spec.volumes[*].nfs
  • spec.volumes[*].iscsi
  • spec.volumes[*].glusterfs
  • spec.volumes[*].rbd
  • spec.volumes[*].flexVolume
  • spec.volumes[*].cinder
  • spec.volumes[*].cephfs
  • spec.volumes[*].flocker
  • spec.volumes[*].fc
  • spec.volumes[*].azureFile
  • spec.volumes[*].vsphereVolume
  • spec.volumes[*].quobyte
  • spec.volumes[*].azureDisk
  • spec.volumes[*].portworxVolume
  • spec.volumes[*].scaleIO
  • spec.volumes[*].storageos
  • spec.volumes[*].photonPersistentDisk

Allowed Values

  • Undefined/nil
Privilege Escalation (v1.8+)

Privilege escalation (such as via set-user-ID or set-group-ID file mode) should not be allowed.

Restricted Fields

  • spec.containers[*].securityContext.allowPrivilegeEscalation
  • spec.initContainers[*].securityContext.allowPrivilegeEscalation
  • spec.ephemeralContainers[*].securityContext.allowPrivilegeEscalation

Allowed Values

  • false
Running as Non-root

Containers must be required to run as non-root users.

Restricted Fields

  • spec.securityContext.runAsNonRoot
  • spec.containers[*].securityContext.runAsNonRoot
  • spec.initContainers[*].securityContext.runAsNonRoot
  • spec.ephemeralContainers[*].securityContext.runAsNonRoot

Allowed Values

  • true
The container fields may be undefined/nil if the pod-level spec.securityContext.runAsNonRoot is set to true.
Non-root groups (optional)

Containers should be forbidden from running with a root primary or supplementary GID.

Restricted Fields

  • spec.securityContext.runAsGroup
  • spec.securityContext.supplementalGroups[*]
  • spec.securityContext.fsGroup
  • spec.containers[*].securityContext.runAsGroup
  • spec.initContainers[*].securityContext.runAsGroup
  • spec.ephemeralContainers[*].securityContext.runAsGroup

Allowed Values

  • Undefined/nil (except for *.runAsGroup)
  • Non-zero
Seccomp (v1.19+)

Seccomp profile must be explicitly set to one of the allowed values. Both the Unconfined profile and the absence of a profile are prohibited.

Restricted Fields

  • spec.securityContext.seccompProfile.type
  • spec.containers[*].securityContext.seccompProfile.type
  • spec.initContainers[*].securityContext.seccompProfile.type
  • spec.ephemeralContainers[*].securityContext.seccompProfile.type

Allowed Values

  • RuntimeDefault
  • Localhost
The container fields may be undefined/nil if the pod-level spec.securityContext.seccompProfile.type field is set appropriately. Conversely, the pod-level field may be undefined/nil if _all_ container- level fields are set.
Capabilities (v1.22+)

Containers must drop ALL capabilities, and are only permitted to add back the NET_BIND_SERVICE capability.

Restricted Fields

  • spec.containers[*].securityContext.capabilities.drop
  • spec.initContainers[*].securityContext.capabilities.drop
  • spec.ephemeralContainers[*].securityContext.capabilities.drop

Allowed Values

  • Any list of capabilities that includes ALL

Restricted Fields

  • spec.containers[*].securityContext.capabilities.add
  • spec.initContainers[*].securityContext.capabilities.add
  • spec.ephemeralContainers[*].securityContext.capabilities.add

Allowed Values

  • Undefined/nil
  • NET_BIND_SERVICE

Policy Instantiation

Decoupling policy definition from policy instantiation allows for a common understanding and consistent language of policies across clusters, independent of the underlying enforcement mechanism.

As mechanisms mature, they will be defined below on a per-policy basis. The methods of enforcement of individual policies are not defined here.

Pod Security Admission Controller

PodSecurityPolicy (Deprecated)

FAQ

Why isn't there a profile between privileged and baseline?

The three profiles defined here have a clear linear progression from most secure (restricted) to least secure (privileged), and cover a broad set of workloads. Privileges required above the baseline policy are typically very application specific, so we do not offer a standard profile in this niche. This is not to say that the privileged profile should always be used in this case, but that policies in this space need to be defined on a case-by-case basis.

SIG Auth may reconsider this position in the future, should a clear need for other profiles arise.

What's the difference between a security profile and a security context?

Security Contexts configure Pods and Containers at runtime. Security contexts are defined as part of the Pod and container specifications in the Pod manifest, and represent parameters to the container runtime.

Security profiles are control plane mechanisms to enforce specific settings in the Security Context, as well as other related parameters outside the Security Context. As of July 2021, Pod Security Policies are deprecated in favor of the built-in Pod Security Admission Controller.

Other alternatives for enforcing security profiles are being developed in the Kubernetes ecosystem, such as OPA Gatekeeper.

What profiles should I apply to my Windows Pods?

Windows in Kubernetes has some limitations and differentiators from standard Linux-based workloads. Specifically, the Pod SecurityContext fields have no effect on Windows. As such, no standardized Pod Security profiles currently exist.

What about sandboxed Pods?

There is not currently an API standard that controls whether a Pod is considered sandboxed or not. Sandbox Pods may be identified by the use of a sandboxed runtime (such as gVisor or Kata Containers), but there is no standard definition of what a sandboxed runtime is.

The protections necessary for sandboxed workloads can differ from others. For example, the need to restrict privileged permissions is lessened when the workload is isolated from the underlying kernel. This allows for workloads requiring heightened permissions to still be isolated.

Additionally, the protection of sandboxed workloads is highly dependent on the method of sandboxing. As such, no single recommended profile is recommended for all sandboxed workloads.

Last modified July 13, 2021 at 2:07 AM PST : incorporating initial round of feedback (e0d4b53b1)