This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.

Kubernetes 1.20: Kubernetes Volume Snapshot Moves to GA

By Xing Yang (VMware), Xiangqian Yu (Google) | Thursday, December 10, 2020

The Kubernetes Volume Snapshot feature is now GA in Kubernetes v1.20. It was introduced as alpha in Kubernetes v1.12, followed by a second alpha with breaking changes in Kubernetes v1.13, and promotion to beta in Kubernetes 1.17. This blog post summarizes the changes releasing the feature from beta to GA.

What is a volume snapshot?

Many storage systems (like Google Cloud Persistent Disks, Amazon Elastic Block Storage, and many on-premise storage systems) provide the ability to create a “snapshot” of a persistent volume. A snapshot represents a point-in-time copy of a volume. A snapshot can be used either to rehydrate a new volume (pre-populated with the snapshot data) or to restore an existing volume to a previous state (represented by the snapshot).

Why add volume snapshots to Kubernetes?

Kubernetes aims to create an abstraction layer between distributed applications and underlying clusters so that applications can be agnostic to the specifics of the cluster they run on and application deployment requires no “cluster-specific” knowledge.

The Kubernetes Storage SIG identified snapshot operations as critical functionality for many stateful workloads. For example, a database administrator may want to snapshot a database’s volumes before starting a database operation.

By providing a standard way to trigger volume snapshot operations in Kubernetes, this feature allows Kubernetes users to incorporate snapshot operations in a portable manner on any Kubernetes environment regardless of the underlying storage.

Additionally, these Kubernetes snapshot primitives act as basic building blocks that unlock the ability to develop advanced enterprise-grade storage administration features for Kubernetes, including application or cluster level backup solutions.

What’s new since beta?

With the promotion of Volume Snapshot to GA, the feature is enabled by default on standard Kubernetes deployments and cannot be turned off.

Many enhancements have been made to improve the quality of this feature and to make it production-grade.

The Volume Snapshot APIs and client library were moved to a separate Go module.
A snapshot validation webhook has been added to perform necessary validation on volume snapshot objects. More details can be found in the Volume Snapshot Validation Webhook Kubernetes Enhancement Proposal.
Along with the validation webhook, the volume snapshot controller will start labeling invalid snapshot objects that already existed. This allows users to identify, remove any invalid objects, and correct their workflows. Once the API is switched to the v1 type, those invalid objects will not be deletable from the system.
To provide better insights into how the snapshot feature is performing, an initial set of operation metrics has been added to the volume snapshot controller.
There are more end-to-end tests, running on GCP, that validate the feature in a real Kubernetes cluster. Stress tests (based on Google Persistent Disk and hostPath CSI Drivers) have been introduced to test the robustness of the system.

Other than introducing tightening validation, there is no difference between the v1beta1 and v1 Kubernetes volume snapshot API. In this release (with Kubernetes 1.20), both v1 and v1beta1 are served while the stored API version is still v1beta1. Future releases will switch the stored version to v1 and gradually remove v1beta1 support.

Which CSI drivers support volume snapshots?

Snapshots are only supported for CSI drivers, not for in-tree or FlexVolume drivers. Ensure the deployed CSI driver on your cluster has implemented the snapshot interfaces. For more information, see Container Storage Interface (CSI) for Kubernetes GA.

Currently more than 50 CSI drivers support the Volume Snapshot feature. The GCE Persistent Disk CSI Driver has gone through the tests for upgrading from volume snapshots beta to GA. GA level support for other CSI drivers should be available soon.

Who builds products using volume snapshots?

As of the publishing of this blog, the following participants from the Kubernetes Data Protection Working Group are building products or have already built products using Kubernetes volume snapshots.

How to deploy volume snapshots?

Volume Snapshot feature contains the following components:

Kubernetes Volume Snapshot CRDs
Volume snapshot controller
Snapshot validation webhook
CSI Driver along with CSI Snapshotter sidecar

It is strongly recommended that Kubernetes distributors bundle and deploy the volume snapshot controller, CRDs, and validation webhook as part of their Kubernetes cluster management process (independent of any CSI Driver).

Warning:

The snapshot validation webhook serves as a critical component to transition smoothly from using v1beta1 to v1 API. Not installing the snapshot validation webhook makes prevention of invalid volume snapshot objects from creation/updating impossible, which in turn will block deletion of invalid volume snapshot objects in coming upgrades.

If your cluster does not come pre-installed with the correct components, you may manually install them. See the CSI Snapshotter README for details.

How to use volume snapshots?

Assuming all the required components (including CSI driver) have been already deployed and running on your cluster, you can create volume snapshots using the VolumeSnapshot API object, or use an existing VolumeSnapshot to restore a PVC by specifying the VolumeSnapshot data source on it. For more details, see the volume snapshot documentation.

Note:

The Kubernetes Snapshot API does not provide any application consistency guarantees. You have to prepare your application (pause application, freeze filesystem etc.) before taking the snapshot for data consistency either manually or using higher level APIs/controllers.

Dynamically provision a volume snapshot

To dynamically provision a volume snapshot, create a VolumeSnapshotClass API object first.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: test-snapclass
driver: testdriver.csi.k8s.io
deletionPolicy: Delete
parameters:
  csi.storage.k8s.io/snapshotter-secret-name: mysecret
  csi.storage.k8s.io/snapshotter-secret-namespace: mysecretnamespace

Then create a VolumeSnapshot API object from a PVC by specifying the volume snapshot class.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: test-snapshot
  namespace: ns1
spec:
  volumeSnapshotClassName: test-snapclass
  source:
    persistentVolumeClaimName: test-pvc

Importing an existing volume snapshot with Kubernetes

To import a pre-existing volume snapshot into Kubernetes, manually create a VolumeSnapshotContent object first.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  name: test-content
spec:
  deletionPolicy: Delete
  driver: testdriver.csi.k8s.io
  source:
    snapshotHandle: 7bdd0de3-xxx
  volumeSnapshotRef:
    name: test-snapshot
    namespace: default

Then create a VolumeSnapshot object pointing to the VolumeSnapshotContent object.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: test-snapshot
spec:
  source:
        volumeSnapshotContentName: test-content

Rehydrate volume from snapshot

A bound and ready VolumeSnapshot object can be used to rehydrate a new volume with data pre-populated from snapshotted data as shown here:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-restore
  namespace: demo-namespace
spec:
  storageClassName: test-storageclass
  dataSource:
    name: test-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

How to add support for snapshots in a CSI driver?

See the CSI spec and the Kubernetes-CSI Driver Developer Guide for more details on how to implement the snapshot feature in a CSI driver.

What are the limitations?

The GA implementation of volume snapshots for Kubernetes has the following limitations:

Does not support reverting an existing PVC to an earlier state represented by a snapshot (only supports provisioning a new volume from a snapshot).

How to learn more?

The code repository for snapshot APIs and controller is here: https://github.com/kubernetes-csi/external-snapshotter

Check out additional documentation on the snapshot feature here: http://k8s.io/docs/concepts/storage/volume-snapshots and https://kubernetes-csi.github.io/docs/

How to get involved?

This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together.

We offer a huge thank you to the contributors who stepped up these last few quarters to help the project reach GA. We want to thank Saad Ali, Michelle Au, Tim Hockin, and Jordan Liggitt for their insightful reviews and thorough consideration with the design, thank Andi Li for his work on adding the support of the snapshot validation webhook, thank Grant Griffiths on implementing metrics support in the snapshot controller and handling password rotation in the validation webhook, thank Chris Henzie, Raunak Shah, and Manohar Reddy for writing critical e2e tests to meet the scalability and stability requirements for graduation, thank Kartik Sharma for moving snapshot APIs and client lib to a separate go module, and thank Raunak Shah and Prafull Ladha for their help with upgrade testing from beta to GA.

There are many more people who have helped to move the snapshot feature from beta to GA. We want to thank everyone who has contributed to this effort:

For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We’re rapidly growing and always welcome new contributors.

We also hold regular Data Protection Working Group meetings. New attendees are welcome to join in discussions.