Install Drivers and Allocate Devices with DRA

FEATURE STATE: Kubernetes v1.35 [stable](enabled by default)

This tutorial shows you how to install Dynamic Resource Allocation (DRA) drivers in your cluster and how to use them in conjunction with the DRA APIs to allocate devices to Pods. This page is intended for cluster administrators.

Dynamic Resource Allocation (DRA) lets a cluster manage availability and allocation of hardware resources to satisfy Pod-based claims for hardware requirements and preferences. To support this, a mixture of Kubernetes built-in components (like the Kubernetes scheduler, kubelet, and kube-controller-manager) and third-party drivers from device owners (called DRA drivers) share the responsibility to advertise, allocate, prepare, mount, healthcheck, unprepare, and cleanup resources throughout the Pod lifecycle. These components share information via a series of DRA specific APIs in the resource.k8s.io API group including DeviceClasses, ResourceSlices, ResourceClaims, as well as new fields in the Pod spec itself.

Objectives

Deploy an example DRA driver
Deploy a Pod requesting a hardware claim using DRA APIs
Delete a Pod that has a claim

Before you begin

Your cluster should support RBAC. You can try this tutorial with a cluster using a different authorization mechanism, but in that case you will have to adapt the steps around defining roles and permissions.

You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:

This tutorial has been tested with Linux nodes, though it may also work with other types of nodes.

Your Kubernetes server must be at or later than version v1.34.

To check the version, enter kubectl version.

If your cluster is not currently running Kubernetes 1.35 then please check the documentation for the version of Kubernetes that you plan to use.

Explore the initial cluster state

You can spend some time to observe the initial state of a cluster with DRA enabled, especially if you have not used these APIs extensively before. If you set up a new cluster for this tutorial, with no driver installed and no Pod claims yet to satisfy, the output of these commands won't show any resources.

Get a list of DeviceClasses:
```
kubectl get deviceclasses
```
The output is similar to this:
```
No resources found
```
Get a list of ResourceSlices:
```
kubectl get resourceslices
```
The output is similar to this:
```
No resources found
```

Get a list of ResourceClaims and ResourceClaimTemplates

kubectl get resourceclaims -A
kubectl get resourceclaimtemplates -A

The output is similar to this:

No resources found
No resources found

At this point, you have confirmed that DRA is enabled and configured properly in the cluster, and that no DRA drivers have advertised any resources to the DRA APIs yet.

Install an example DRA driver

DRA drivers are third-party applications that run on each node of your cluster to interface with the hardware of that node and Kubernetes' built-in DRA components. The installation procedure depends on the driver you choose, but is likely deployed as a DaemonSet to all or a selection of the nodes (using selectors or similar mechanisms) in your cluster.

Check your driver's documentation for specific installation instructions, which might include a Helm chart, a set of manifests, or other deployment tooling.

This tutorial uses an example driver which can be found in the kubernetes-sigs/dra-example-driver repository to demonstrate driver installation. This example driver advertises simulated GPUs to Kubernetes for your Pods to interact with.

Prepare your cluster for driver installation

To simplify cleanup, create a namespace named dra-tutorial:

Create the namespace:
```
kubectl create namespace dra-tutorial 
```

In a production environment, you would likely be using a previously released or qualified image from the driver vendor or your own organization, and your nodes would need to have access to the image registry where the driver image is hosted. In this tutorial, you will use a publicly released image of the dra-example-driver to simulate access to a DRA driver image.

Confirm your nodes have access to the image by running the following from within one of your cluster's nodes:
```
docker pull registry.k8s.io/dra-example-driver/dra-example-driver:v0.2.0
```

Deploy the DRA driver components

For this tutorial, you will install the critical example resource driver components individually with kubectl.

Create the DeviceClass representing the device types this DRA driver supports:

dra/driver-install/deviceclass.yaml

apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
  name: gpu.example.com
spec:
  selectors:
  - cel: 
      expression: "device.driver == 'gpu.example.com'"

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/deviceclass.yaml

Create the ServiceAccount, ClusterRole and ClusterRoleBinding that will be used by the driver to gain permissions to interact with the Kubernetes API on this cluster:

Create the Service Account:

dra/driver-install/serviceaccount.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: dra-example-driver-service-account
  namespace: dra-tutorial
  labels:
    app.kubernetes.io/name: dra-example-driver
    app.kubernetes.io/instance: dra-example-driver

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/serviceaccount.yaml

Create the ClusterRole:

dra/driver-install/clusterrole.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: dra-example-driver-role
rules:
- apiGroups: ["resource.k8s.io"]
  resources: ["resourceclaims"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get"]
- apiGroups: ["resource.k8s.io"]
  resources: ["resourceslices"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/clusterrole.yaml

Create the ClusterRoleBinding:

dra/driver-install/clusterrolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dra-example-driver-role-binding
subjects:
- kind: ServiceAccount
  name: dra-example-driver-service-account
  namespace: dra-tutorial
roleRef:
  kind: ClusterRole
  name: dra-example-driver-role
  apiGroup: rbac.authorization.k8s.io

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/clusterrolebinding.yaml

Create a PriorityClass for the DRA driver. The PriorityClass prevents preemption of th DRA driver component, which is responsible for important lifecycle operations for Pods with claims. Learn more about pod priority and preemption here.

dra/driver-install/priorityclass.yaml

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: dra-driver-high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for DRA driver pods only."

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/priorityclass.yaml

Deploy the actual DRA driver as a DaemonSet configured to run the example driver binary with the permissions provisioned above. The DaemonSet has the permissions that you granted to the ServiceAccount in the previous steps.

dra/driver-install/daemonset.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: dra-example-driver-kubeletplugin
  namespace: dra-tutorial
  labels:
    app.kubernetes.io/name: dra-example-driver
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: dra-example-driver
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: dra-example-driver
    spec:
      priorityClassName: dra-driver-high-priority
      serviceAccountName: dra-example-driver-service-account
      securityContext:
        {}
      containers:
      - name: plugin
        securityContext:
          privileged: true
        image: registry.k8s.io/dra-example-driver/dra-example-driver:v0.2.0
        imagePullPolicy: IfNotPresent
        command: ["dra-example-kubeletplugin"]
        resources:
          {}
        # Production drivers should always implement a liveness probe
        # For the tutorial we simply omit it
        # livenessProbe:
        #   grpc:
        #     port: 51515
        #     service: liveness
        #   failureThreshold: 3
        #   periodSeconds: 10
        env:
        - name: CDI_ROOT
          value: /var/run/cdi
        - name: KUBELET_REGISTRAR_DIRECTORY_PATH
          value: "/var/lib/kubelet/plugins_registry"
        - name: KUBELET_PLUGINS_DIRECTORY_PATH
          value: "/var/lib/kubelet/plugins"
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        # Simulated number of devices the example driver will pretend to have.
        - name: NUM_DEVICES
          value: "9"
        - name: HEALTHCHECK_PORT
          value: "51515"
        volumeMounts:
        - name: plugins-registry
          mountPath: "/var/lib/kubelet/plugins_registry"
        - name: plugins
          mountPath: "/var/lib/kubelet/plugins"
        - name: cdi
          mountPath: /var/run/cdi
      volumes:
      - name: plugins-registry
        hostPath:
          path: "/var/lib/kubelet/plugins_registry"
      - name: plugins
        hostPath:
          path: "/var/lib/kubelet/plugins"
      - name: cdi
        hostPath:
          path: /var/run/cdi

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/daemonset.yaml

The DaemonSet is configured with the volume mounts necessary to interact with the underlying Container Device Interface (CDI) directory, and to expose its socket to kubelet via the kubelet/plugins directory.

Verify the DRA driver installation

Get a list of the Pods of the DRA driver DaemonSet across all worker nodes:

kubectl get pod -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial

The output is similar to this:

NAME                                     READY   STATUS    RESTARTS   AGE
dra-example-driver-kubeletplugin-4sk2x   1/1     Running   0          13s
dra-example-driver-kubeletplugin-cttr2   1/1     Running   0          13s

The initial responsibility of each node's local DRA driver is to update the cluster with what devices are available to Pods on that node, by publishing its metadata to the ResourceSlices API. You can check that API to see that each node with a driver is advertising the device class it represents.
Check for available ResourceSlices:
```
kubectl get resourceslices
```
The output is similar to this:
```
NAME                                 NODE           DRIVER            POOL           AGE
kind-worker-gpu.example.com-k69gd    kind-worker    gpu.example.com   kind-worker    19s
kind-worker2-gpu.example.com-qdgpn   kind-worker2   gpu.example.com   kind-worker2   19s
```

At this point, you have successfully installed the example DRA driver, and confirmed its initial configuration. You're now ready to use DRA to schedule Pods.

Claim resources and deploy a Pod

To request resources using DRA, you create ResourceClaims or ResourceClaimTemplates that define the resources that your Pods need. In the example driver, a memory capacity attribute is exposed for mock GPU devices. This section shows you how to use Common Expression Language to express your requirements in a ResourceClaim, select that ResourceClaim in a Pod specification, and observe the resource allocation.

This tutorial showcases only one basic example of a DRA ResourceClaim. Read Dynamic Resource Allocation to learn more about ResourceClaims.

Create the ResourceClaim

In this section, you create a ResourceClaim and reference it in a Pod. Whatever the claim, the deviceClassName is a required field, narrowing down the scope of the request to a specific device class. The request itself can include a Common Expression Language expression that references attributes that may be advertised by the driver managing that device class.

In this example, you will create a request for any GPU advertising over 10Gi memory capacity. The attribute exposing capacity from the example driver takes the form device.capacity['gpu.example.com'].memory. Note also that the name of the claim is set to some-gpu.

dra/driver-install/example/resourceclaim.yaml

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
 name: some-gpu
 namespace: dra-tutorial
spec:
   devices:
     requests:
     - name: some-gpu
       exactly:
         deviceClassName: gpu.example.com
         selectors:
         - cel:
             expression: "device.capacity['gpu.example.com'].memory.compareTo(quantity('10Gi')) >= 0"

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/example/resourceclaim.yaml

Create the Pod that references that ResourceClaim

Below is the Pod manifest referencing the ResourceClaim you just made, some-gpu, in the spec.resourceClaims.resourceClaimName field. The local name for that claim, gpu, is then used in the spec.containers.resources.claims.name field to allocate the claim to the Pod's underlying container.

dra/driver-install/example/pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod0
  namespace: dra-tutorial
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: ubuntu:24.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimName: some-gpu

kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/example/pod.yaml

Confirm the pod has deployed:

kubectl get pod pod0 -n dra-tutorial

The output is similar to this:

NAME   READY   STATUS    RESTARTS   AGE
pod0   1/1     Running   0          9s

Explore the DRA state

After you create the Pod, the cluster tries to schedule that Pod to a node where Kubernetes can satisfy the ResourceClaim. In this tutorial, the DRA driver is deployed on all nodes, and is advertising mock GPUs on all nodes, all of which have enough capacity advertised to satisfy the Pod's claim, so Kubernetes can schedule this Pod on any node and can allocate any of the mock GPUs on that node.

When Kubernetes allocates a mock GPU to a Pod, the example driver adds environment variables in each container it is allocated to in order to indicate which GPUs would have been injected into them by a real resource driver and how they would have been configured, so you can check those environment variables to see how the Pods have been handled by the system.

Check the Pod logs, which report the name of the mock GPU that was allocated:

kubectl logs pod0 -c ctr0 -n dra-tutorial | grep -E "GPU_DEVICE_[0-9]+=" | grep -v "RESOURCE_CLAIM"

The output is similar to this:

declare -x GPU_DEVICE_0="gpu-0"

Check the state of the ResourceClaim object:
```
kubectl get resourceclaims -n dra-tutorial
```
The output is similar to this:
```
NAME       STATE                AGE
some-gpu   allocated,reserved   34s
```
In this output, the STATE column shows that the ResourceClaim is allocated and reserved.

Check the details of the some-gpu ResourceClaim. The status stanza of the ResourceClaim has information about the allocated device and the Pod it has been reserved for:

kubectl get resourceclaim some-gpu -n dra-tutorial -o yaml

The output is similar to this:

 2 3 4  5  6  7  8  9 10 1112 13 14 15 16 17 18 19 20 21 22 2324 25 26 2728293031 32 33 34 35 36 37 38394041 1apiVersion: resource.k8s.io/v1 style=color:green;font-weight:700>kind: ResourceClaim style=color:green;font-weight:700>metadata: creationTimestamp: "2025-08-20T18:17:31Z" finalizers: - resource.kubernetes.io/delete-protection name: some-gpu namespace: dra-tutorial resourceVersion: "2326" uid: d3e48dbf-40da-47c3-a7b9-f7d54d1051c3 style=color:green;font-weight:700>spec: devices: requests: - exactly: allocationMode: ExactCount count: 1 deviceClassName: gpu.example.com selectors: - cel: expression: device.capacity['gpu.example.com'].memory.compareTo(quantity('10Gi')) >= 0 name: some-gpu style=color:green;font-weight:700>status: allocation: devices: results: style=color:#bbb>        - device: gpu-0 style=color:#bbb>            driver: gpu.example.com style=color:#bbb>            pool: kind-worker style=color:#bbb>            request: some-gpu nodeSelector: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - kind-worker style=color:#bbb>    reservedFor: style=color:#bbb>    - name: pod0 style=color:#bbb>        resource: pods style=color:#bbb>        uid: c4dadf20-392a-474d-a47b-ab82080c8bd7

To check how the driver handled device allocation, get the logs for the driver DaemonSet Pods:

kubectl logs -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial

The output is similar to this:

I0820 18:17:44.131324       1 driver.go:106] PrepareResourceClaims is called: number of claims: 1
I0820 18:17:44.135056       1 driver.go:133] Returning newly prepared devices for claim 'd3e48dbf-40da-47c3-a7b9-f7d54d1051c3': [{[some-gpu] kind-worker gpu-0 [k8s.gpu.example.com/gpu=common k8s.gpu.example.com/gpu=d3e48dbf-40da-47c3-a7b9-f7d54d1051c3-gpu-0]}]

You have now successfully deployed a Pod that claims devices using DRA, verified that the Pod was scheduled to an appropriate node, and saw that the associated DRA APIs kinds were updated with the allocation status.

Delete a Pod that has a claim

When a Pod with a claim is deleted, the DRA driver deallocates the resource so it can be available for future scheduling. To validate this behavior, delete the Pod that you created in the previous steps and watch the corresponding changes to the ResourceClaim and driver.

Delete the pod0 Pod:

kubectl delete pod pod0 -n dra-tutorial

The output is similar to this:

pod "pod0" deleted

Observe the DRA state

When the Pod is deleted, the driver deallocates the device from the ResourceClaim and updates the ResourceClaim resource in the Kubernetes API. The ResourceClaim has a pending state until it's referenced in a new Pod.

Check the state of the some-gpu ResourceClaim:

kubectl get resourceclaims -n dra-tutorial

The output is similar to this:

NAME       STATE     AGE
some-gpu   pending   76s

Verify that the driver has processed unpreparing the device for this claim by checking the driver logs:

kubectl logs -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial

The output is similar to this:

I0820 18:22:15.629376       1 driver.go:138] UnprepareResourceClaims is called: number of claims: 1

You have now deleted a Pod that had a claim, and observed that the driver took action to unprepare the underlying hardware resource and update the DRA APIs to reflect that the resource is available again for future scheduling.

Cleaning up

To clean up the resources that you created in this tutorial, follow these steps:

kubectl delete namespace dra-tutorial
kubectl delete deviceclass gpu.example.com
kubectl delete clusterrole dra-example-driver-role
kubectl delete clusterrolebinding dra-example-driver-role-binding
kubectl delete priorityclass dra-driver-high-priority

What's next

Last modified September 19, 2025 at 9:38 PM PST: fix: typos (a5d40c68e0)

Install Drivers and Allocate Devices with DRA

Objectives

Before you begin

Explore the initial cluster state

Install an example DRA driver

Prepare your cluster for driver installation

Deploy the DRA driver components

Verify the DRA driver installation

Claim resources and deploy a Pod

Create the ResourceClaim

Create the Pod that references that ResourceClaim

Explore the DRA state

Delete a Pod that has a claim

Observe the DRA state

Cleaning up

What's next

Feedback