This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.

How we built a dynamic Kubernetes API Server for the API Aggregation Layer in Cozystack

By Andrei Kvapil (Ænix) | Thursday, November 21, 2024

Hi there! I'm Andrei Kvapil, but you might know me as @kvaps in communities dedicated to Kubernetes and cloud-native tools. In this article, I want to share how we implemented our own extension api-server in the open-source PaaS platform, Cozystack.

Kubernetes truly amazes me with its powerful extensibility features. You're probably already familiar with the controller concept and frameworks like kubebuilder and operator-sdk that help you implement it. In a nutshell, they allow you to extend your Kubernetes cluster by defining custom resources (CRDs) and writing additional controllers that handle your business logic for reconciling and managing these kinds of resources. This approach is well-documented, with a wealth of information available online on how to develop your own operators.

However, this is not the only way to extend the Kubernetes API. For more complex scenarios such as implementing imperative logic, managing subresources, and dynamically generating responses—the Kubernetes API aggregation layer provides an effective alternative. Through the aggregation layer, you can develop a custom extension API server and seamlessly integrate it within the broader Kubernetes API framework.

In this article, I will explore the API aggregation layer, the types of challenges it is well-suited to address, cases where it may be less appropriate, and how we utilized this model to implement our own extension API server in Cozystack.

What Is the API Aggregation Layer?

First, let's get definitions straight to avoid any confusion down the road. The API aggregation layer is a feature in Kubernetes, while an extension api-server is a specific implementation of an API server for the aggregation layer. An extension API server is just like the standard Kubernetes API server, except it runs separately and handles requests for your specific resource types.

So, the aggregation layer lets you write your own extension API server, integrate it easily into Kubernetes, and directly process requests for resources in a certain group. Unlike the CRD mechanism, the extension API is registered in Kubernetes as an APIService, telling Kubernetes to consider this new API server and acknowledge that it serves certain APIs.

You can execute this command to list all registered apiservices:

kubectl get apiservices.apiregistration.k8s.io

Example APIService:

NAME                          	SERVICE                   	AVAILABLE   AGE
v1alpha1.apps.cozystack.io    	cozy-system/cozystack-api 	True    	7h29m

As soon as the Kubernetes api-server receives requests for resources in the group v1alpha1.apps.cozystack.io, it redirects all those requests to our extension api-server, which can handle them based on the business logic we've built into it.

When to use the API Aggregation Layer

The API Aggregation Layer helps solve several issues where the usual CRD mechanism might not enough. Let's break them down.

Imperative Logic and Subresources

Besides regular resources, Kubernetes also has something called subresources.

In Kubernetes, subresources are additional actions or operations you can perform on primary resources (like Pods, Deployments, Services) via the Kubernetes API. They provide interfaces to manage specific aspects of resources without affecting the entire object.

A simple example is status, which is traditionally exposed as a separate subresource that you can access independently from the parent object. The status field isn't meant to be changed

But beyond /status, Pods in Kubernetes also have subresources like /exec, /portforward, and /log. Interestingly, instead of the usual declarative resources in Kubernetes, these represent endpoints for imperative operations like viewing logs, proxying connections, executing commands in a running container, and so on.

To support such imperative commands on your own API, you need implement an extension API and an extension API server. Here are some well-known examples:

KubeVirt: An add-on for Kubernetes that extends its API capabilities to run traditional virtual machines. The extension api-server created as part of KubeVirt handles subresources like /restart, /console, and /vnc for virtual machines.
Knative: A Kubernetes add-on that extends its capabilities for serverless computing, implementing the /scale subresource to set up autoscaling for its resource types.

By the way, even though subresource logic in Kubernetes can be imperative, you can manage access to them declaratively using Kubernetes standard RBAC model.

For example this way you can control access to the /log and /exec subresources of the Pod kind:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: pod-and-pod-logs-reader
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods/exec"]
  verbs: ["create"]

You're not tied to use etcd

Usually, the Kubernetes API server uses etcd for its backend. However, implementing your own API server doesn't lock you into using only etcd. If it doesn't make sense to store your server's state in etcd, you can store information in any other system and generate responses on the fly. Here are a few cases to illustrate:

metrics-server is a standard extension for Kubernetes which allows you to view real-time metrics of your nodes and pods. It defines alternative Pod and Node kinds in its own metrics.k8s.io API. Requests to these resources are translated into metrics directly from Kubelet. So when you run kubectl top node or kubectl top pod, metrics-server fetches metrics from cAdvisor in real-time. It then returns these metrics to you. Since the information is generated in real-time and is only relevant at the moment of the request, there is no need to store it in etcd. This approach saves resources.
If needed, you can use a backend other than etcd. You can even implement a Kubernetes-compatible API for it. For example, if you use Postgres, you can create a transparent representation of its entities in the Kubernetes API. Eg. databases, users, and grants within Postgres would appear as regular Kubernetes resources, thanks to your extension API server. You could manage them using kubectl or any other Kubernetes-compatible tool. Unlike controllers, which implement business logic using custom resources and reconciliation methods, an extension API server eliminates the need for separate controllers for every kind. This means you don't have to sync state between the Kubernetes API and your backend.

One-Time resources

Kubernetes has a special API used to provide users with information about their permissions. This is implemented using the SelfSubjectAccessReview API. One unusual detail of these resources is that you can't view them using get or list verbs. You can only create them (using the create verb) and receive output with information about what you have access to at that moment.
If you try to run kubectl get selfsubjectaccessreviews directly, you'll just get an error like this:
```
Error from server (MethodNotAllowed): the server does not allow this method on the requested resource
```
The reason is that the Kubernetes API server doesn't support any other interaction with this type of resource (you can only CREATE them).
The SelfSubjectAccessReview API supports commands such as:
```
kubectl auth can-i create deployments --namespace dev
```
When you run the command above, kubectl creates a SelfSubjectAccessReview using the Kubernetes API. This allows Kubernetes to fetch a list of possible permissions for your user. Kubernetes then generates a personalized response to your request in real-time. This logic is different from a scenario where this resource is simply stored in etcd.
Similarly, in KubeVirt's CDI (Containerized Data Importer) extension, which allows file uploads into a PVC from a local machine using the virtctl tool, a special token is required before the upload process begins. This token is generated by creating an UploadTokenRequest resource via the Kubernetes API. Kubernetes routes (proxies) all UploadTokenRequest resource creation requests to the CDI extension API server, which generates and returns the token in response.

Full control over conversion, validation, and output formatting

Your own API server can have all the capabilities of the vanilla Kubernetes API server. The resources you create in your API server can be validated immediately on the server side without additional webhooks. While CRDs also support server-side validation using Common Expression Language (CEL) for declarative validation and ValidatingAdmissionPolicies without the need for webhooks, a custom API server allows for more complex and tailored validation logic if needed.
Kubernetes allows you to serve multiple API versions for each resource type, traditionally v1alpha1, v1beta1 and v1. Only one version can be specified as the storage version. All requests to other versions must be automatically converted to the version specified as storage version. With CRDs, this mechanism is implemented using conversion webhooks. Whereas in an extension API server, you can implement your own conversion mechanism, choose to mix up different storage versions (one object might be serialized as v1, another as v2), or rely on an external backing API.
Directly implementing the Kubernetes API lets you format table output however you like and doesn't force you to follow the additionalPrinterColumns logic in CRDs. Instead, you can write your own formatter that formats the table output and custom fields in it. For example, when using additionalPrinterColumns, you can display field values only following the JSONPath logic. In your own API server, you can generate and insert values on the fly, formatting the table output as you wish.

Dynamic resource registration

The resources served by an extension api-server don't need to be pre-registered as CRDs. Once your extension API server is registered using an APIService, Kubernetes starts polling it to discover APIs and resources it can serve. After receiving a discovery response, the Kubernetes API server automatically registers all available types for this API group. Although this isn't considered common practice, you can implement logic that dynamically registers the resource types you need in your Kubernetes cluster.

When not to use the API Aggregation Layer

There are some anti-patterns where using the API Aggregation Layer isn't recommended. Let's go through them.

Unstable backend

If your API server stops responding for some reason due to an unavailable backend or other issues it may block some Kubernetes functionality. For example, when deleting namespaces, Kubernetes will wait for a response from your API server to see if there are any remaining resources. If the response doesn't come, the namespace deletion will be blocked.

Also, you might have encountered a situation where, when the metrics-server is unavailable, an extra message appears in stderr after every API request (even unrelated to metrics) stating that metrics.k8s.io is unavailable. This is another example of how using the API Aggregation Layer can lead to problems when the api-server handling requests is unavailable.

Slow requests

If you can't guarantee an instant response for user requests, it's better to consider using a CustomResourceDefinition and controller. Otherwise, you might make your cluster less stable. Many projects implement an extension API server only for a limited set of resources, particularly for imperative logic and subresources. This recommendation is also mentioned in the official Kubernetes documentation.

Why we needed it in Cozystack

As a reminder, we're developing the open-source PaaS platform Cozystack, which can also be used as a framework for building your own private cloud. Therefore, the ability to easily extend the platform is crucial for us.

Cozystack is built on top of FluxCD. Any application is packaged into its own Helm chart, ready for deployment in a tenant namespace. Deploying any application on the platform is done by creating a HelmRelease resource, specifying the chart name and parameters for the application. All the rest logic is handled by FluxCD. This pattern allows us to easily extend the platform with new applications and provide the ability to create new applications that just need to be packaged into the appropriate Helm chart.

So, in our platform, everything is configured as HelmRelease resources. However, we ran into two problems: limitations of the RBAC model and the need for a public API. Let's delve into these

Limitations of the RBAC model

The widely-deployed RBAC system in Kubernetes doesn't allow you to restrict access to a list of resources of the same kind based on labels or specific fields in the spec. When creating a role, you can limit access across the resources in the same kind only by specifying specific resource names in resourceNames. For verbs like get or update it will work. However, filtering by resourceNames using list verb doesn't work like that. Thus you can limit listing certain resources by kind but not by name.

Kubernetes has a special API used to provide users with information about their permissions. This is implemented using the SelfSubjectAccessReview API. One unusual detail of these resources is that you can't view them using get or list verbs. You can only create them (using the create verb) and receive output with information about what you have access to at that moment.

So, we decided to introduce new resource types based on the names of the Helm charts they use and generate the list of available kinds dynamically at runtime in our extension api-server. This way, we can reuse Kubernetes standard RBAC model to manage access to specific resource types.

Need for a public API

Since our platform provides capabilities for deploying various managed services, we want to organize public access to the platform's API. However, we can't allow users to interact directly with resources like HelmRelease because that would let them specify arbitrary names and parameters for Helm charts to deploy, potentially compromising our system.

We wanted to give users the ability to deploy a specific service simply by creating the resource with corresponding kind in Kubernetes. The type of this resource should be named the same as the chart from which it's deployed. Here are some examples:

kind: Kubernetes → chart: kubernetes
kind: Postgres → chart: postgres
kind: Redis → chart: redis
kind: VirtualMachine → chart: virtual-machine

Moreover, we don't want to have to add a new type to codegen and recompile our extension API server every time we add a new chart for it to start being served. The schema update should be done dynamically or provided via a ConfigMap by the administrator.

Two-Way conversion

Currently, we already have integrations and a dashboard that continue to use HelmRelease resources. At this stage, we didn't want to lose the ability to support this API. Considering that we're simply translating one resource into another, support is maintained and it works both ways. If you create a HelmRelease, you'll get a custom resource in Kubernetes, and if you create a custom resource in Kubernetes, it will also be available as a HelmRelease.

We don't have any additional controllers that synchronize state between these resources. All requests to resources in our extension API server are transparently proxied to HelmRelease and vice versa. This eliminates intermediate states and the need to write controllers and synchronization logic.

Implementation

To implement the Aggregation API, you might consider starting with the following projects:

apiserver-builder: Currently in alpha and hasn't been updated for two years. It works like kubebuilder, providing a framework for creating an extension API server, allowing you to sequentially create a project structure and generate code for your resources.
sample-apiserver: A ready-made example of an implemented API server, based on official Kubernetes libraries, which you can use as a foundation for your project.

For practical reasons, we chose the second project. Here's what we needed to do:

Disable etcd support

In our case, we don't need it since all resources are stored directly in the Kubernetes API.

You can disable etcd options by passing nil to RecommendedOptions.Etcd:

Disabling etcd options

Generate a common resource kind

We called it Application, and it looks like this:

Application type definition

This is a generic type used for any application type, and its handling logic is the same for all charts.

Configure configuration loading

Since we want to configure our extension api-server via a config file, we formed the config structure in Go:

Config type definition

We also modified the resource registration logic so that the resources we create are registered in scheme with different Kind values:

Dynamic resource registration

As a result, we got a config where you can pass all possible types and specify what they should map to:

ConfigMap example

Implement our own registry

To store state not in etcd but translate it directly into Kubernetes HelmRelease resources (and vice versa), we wrote conversion functions from Application to HelmRelease and from HelmRelease to Application:

Conversion functions

We implemented logic to filter resources by chart name, sourceRef, and prefix in the HelmRelease name:

Filtering functions

Then, using this logic, we implemented the methods Get(), Delete(), List(), Create().

You can see the full example here:

Registry Implementation

At the end of each method, we set the correct Kind and return an unstructured.Unstructured{} object so that Kubernetes serializes the object correctly. Otherwise, it would always serialize them with kind: Application, which we don't want.

What did we achieve?

In Cozystack, all our types from the ConfigMap are now available in Kubernetes as-is:

kubectl api-resources | grep cozystack

buckets                   apps.cozystack.io/v1alpha1      true        Bucket
clickhouses               apps.cozystack.io/v1alpha1      true        ClickHouse
etcds                     apps.cozystack.io/v1alpha1      true        Etcd
ferretdb                  apps.cozystack.io/v1alpha1      true        FerretDB
httpcaches                apps.cozystack.io/v1alpha1      true        HTTPCache
ingresses                 apps.cozystack.io/v1alpha1      true        Ingress
kafkas                    apps.cozystack.io/v1alpha1      true        Kafka
kuberneteses              apps.cozystack.io/v1alpha1      true        Kubernetes
monitorings               apps.cozystack.io/v1alpha1      true        Monitoring
mysqls                    apps.cozystack.io/v1alpha1      true        MySQL
natses                    apps.cozystack.io/v1alpha1      true        NATS
postgreses                apps.cozystack.io/v1alpha1      true        Postgres
rabbitmqs                 apps.cozystack.io/v1alpha1      true        RabbitMQ
redises                   apps.cozystack.io/v1alpha1      true        Redis
seaweedfses               apps.cozystack.io/v1alpha1      true        SeaweedFS
tcpbalancers              apps.cozystack.io/v1alpha1      true        TCPBalancer
tenants                   apps.cozystack.io/v1alpha1      true        Tenant
virtualmachines           apps.cozystack.io/v1alpha1      true        VirtualMachine
vmdisks                   apps.cozystack.io/v1alpha1      true        VMDisk
vminstances               apps.cozystack.io/v1alpha1      true        VMInstance
vpns                      apps.cozystack.io/v1alpha1      true        VPN

We can work with them just like regular Kubernetes resources.

Listing S3 Buckets:

kubectl get buckets.apps.cozystack.io -n tenant-kvaps

Example output:

NAME         READY   AGE    VERSION
foo          True    22h    0.1.0
testaasd     True    27h    0.1.0

Listing Kubernetes Clusters:

kubectl get kuberneteses.apps.cozystack.io -n tenant-kvaps

Example output:

NAME     READY   AGE    VERSION
abc      False   19h    0.14.0
asdte    True    22h    0.13.0

Listing Virtual Machine Disks:

kubectl get vmdisks.apps.cozystack.io -n tenant-kvaps

Example output:

NAME               READY   AGE    VERSION
docker             True    21d    0.1.0
test               True    18d    0.1.0
win2k25-iso        True    21d    0.1.0
win2k25-system     True    21d    0.1.0

Listing Virtual Machine Instances:

kubectl get vminstances.apps.cozystack.io -n tenant-kvaps

Example output:

NAME        READY   AGE    VERSION
docker      True    21d    0.1.0
test        True    18d    0.1.0
win2k25     True    20d    0.1.0

We can create, modify, and delete each of them, and any interaction with them will be translated into HelmRelease resources, while also applying the resource structure and prefix in the name.

To see all related Helm releases:

kubectl get helmreleases -n tenant-kvaps -l cozystack.io/ui

Example output:

NAME                     AGE    READY
bucket-foo               22h    True
bucket-testaasd          27h    True
kubernetes-abc           19h    False
kubernetes-asdte         22h    True
redis-test               18d    True
redis-yttt               12d    True
vm-disk-docker           21d    True
vm-disk-test             18d    True
vm-disk-win2k25-iso      21d    True
vm-disk-win2k25-system   21d    True
vm-instance-docker       21d    True
vm-instance-test         18d    True
vm-instance-win2k25      20d    True

Next Steps

We don’t intend to stop here with our API. In the future, we plan to add new features:

Add validation based on an OpenAPI spec generated directly from Helm charts.
Develop a controller that collects release notes from deployed releases and shows users access information for specific services.
Revamp our dashboard to work directly with the new API.

Conclusion

The API Aggregation Layer allowed us to quickly and efficiently solve our problem by providing a flexible mechanism for extending the Kubernetes API with dynamically registered resources and converting them on the fly. Ultimately, this made our platform even more flexible and extensible without the need to write code for each new resource.

You can test the API yourself in the open-source PaaS platform Cozystack, starting from version v0.18.