Implementation details
Kubernetes v1.10 [stable]
kubeadm init
and kubeadm join
together provide a nice user experience for creating a
bare Kubernetes cluster from scratch, that aligns with the best-practices.
However, it might not be obvious how kubeadm does that.
This document provides additional details on what happens under the hood, with the aim of sharing knowledge on the best practices for a Kubernetes cluster.
Core design principles
The cluster that kubeadm init
and kubeadm join
set up should be:
- Secure: It should adopt latest best-practices like:
- enforcing RBAC
- using the Node Authorizer
- using secure communication between the control plane components
- using secure communication between the API server and the kubelets
- lock-down the kubelet API
- locking down access to the API for system components like the kube-proxy and CoreDNS
- locking down what a Bootstrap Token can access
- User-friendly: The user should not have to run anything more than a couple of commands:
kubeadm init
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl apply -f <network-plugin-of-choice.yaml>
kubeadm join --token <token> <endpoint>:<port>
- Extendable:
- It should not favor any particular network provider. Configuring the cluster network is out-of-scope
- It should provide the possibility to use a config file for customizing various parameters
Constants and well-known values and paths
In order to reduce complexity and to simplify development of higher level tools that build on top of kubeadm, it uses a limited set of constant values for well-known paths and file names.
The Kubernetes directory /etc/kubernetes
is a constant in the application, since it is clearly the given path
in a majority of cases, and the most intuitive location; other constant paths and file names are:
-
/etc/kubernetes/manifests
as the path where the kubelet should look for static Pod manifests. Names of static Pod manifests are:etcd.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
-
/etc/kubernetes/
as the path where kubeconfig files with identities for control plane components are stored. Names of kubeconfig files are:kubelet.conf
(bootstrap-kubelet.conf
during TLS bootstrap)controller-manager.conf
scheduler.conf
admin.conf
for the cluster admin and kubeadm itselfsuper-admin.conf
for the cluster super-admin that can bypass RBAC
-
Names of certificates and key files:
ca.crt
,ca.key
for the Kubernetes certificate authorityapiserver.crt
,apiserver.key
for the API server certificateapiserver-kubelet-client.crt
,apiserver-kubelet-client.key
for the client certificate used by the API server to connect to the kubelets securelysa.pub
,sa.key
for the key used by the controller manager when signing ServiceAccountfront-proxy-ca.crt
,front-proxy-ca.key
for the front proxy certificate authorityfront-proxy-client.crt
,front-proxy-client.key
for the front proxy client
The kubeadm configuration file format
Most kubeadm commands support a --config
flag which allows passing a configuration file from
disk. The configuration file format follows the common Kubernetes API apiVersion
/ kind
scheme,
but is considered a component configuration format. Several Kubernetes components, such as the kubelet,
also support file-based configuration.
Different kubeadm subcommands require a different kind
of configuration file.
For example, InitConfiguration
for kubeadm init
, JoinConfiguration
for kubeadm join
, UpgradeConfiguration
for kubeadm upgrade
and ResetConfiguration
for kubeadm reset
.
The command kubeadm config migrate
can be used to migrate an older format configuration
file to a newer (current) configuration format. The kubeadm tool only supports migrating from
deprecated configuration formats to the current format.
See the kubeadm configuration reference page for more details.
kubeadm init workflow internal design
The kubeadm init
consists of a sequence of atomic work tasks to perform,
as described in the kubeadm init
internal workflow.
The kubeadm init phase
command allows
users to invoke each task individually, and ultimately offers a reusable and composable
API/toolbox that can be used by other Kubernetes bootstrap tools, by any IT automation tool or by
an advanced user for creating custom clusters.
Preflight checks
Kubeadm executes a set of preflight checks before starting the init, with the aim to verify
preconditions and avoid common cluster startup problems.
The user can skip specific preflight checks or all of them with the --ignore-preflight-errors
option.
- [Warning] if the Kubernetes version to use (specified with the
--kubernetes-version
flag) is at least one minor version higher than the kubeadm CLI version. - Kubernetes system requirements:
- if running on linux:
- [Error] if Kernel is older than the minimum required version
- [Error] if required cgroups subsystem aren't set up
- if running on linux:
- [Error] if the CRI endpoint does not answer
- [Error] if user is not root
- [Error] if the machine hostname is not a valid DNS subdomain
- [Warning] if the host name cannot be reached via network lookup
- [Error] if kubelet version is lower that the minimum kubelet version supported by kubeadm (current minor -1)
- [Error] if kubelet version is at least one minor higher than the required controlplane version (unsupported version skew)
- [Warning] if kubelet service does not exist or if it is disabled
- [Warning] if firewalld is active
- [Error] if API server bindPort or ports 10250/10251/10252 are used
- [Error] if
/etc/kubernetes/manifest
folder already exists and it is not empty - [Error] if swap is on
- [Error] if
ip
,iptables
,mount
,nsenter
commands are not present in the command path - [Warning] if
ethtool
,tc
,touch
commands are not present in the command path - [Warning] if extra arg flags for API server, controller manager, scheduler contains some invalid options
- [Warning] if connection to https://API.AdvertiseAddress:API.BindPort goes through proxy
- [Warning] if connection to services subnet goes through proxy (only first address checked)
- [Warning] if connection to Pods subnet goes through proxy (only first address checked)
- If external etcd is provided:
- [Error] if etcd version is older than the minimum required version
- [Error] if etcd certificates or keys are specified, but not provided
- If external etcd is NOT provided (and thus local etcd will be installed):
- [Error] if ports 2379 is used
- [Error] if Etcd.DataDir folder already exists and it is not empty
- If authorization mode is ABAC:
- [Error] if abac_policy.json does not exist
- If authorization mode is WebHook
- [Error] if webhook_authz.conf does not exist
Note:
Preflight checks can be invoked individually with thekubeadm init phase preflight
command.Generate the necessary certificates
Kubeadm generates certificate and private key pairs for different purposes:
-
A self signed certificate authority for the Kubernetes cluster saved into
ca.crt
file andca.key
private key file -
A serving certificate for the API server, generated using
ca.crt
as the CA, and saved intoapiserver.crt
file with its private keyapiserver.key
. This certificate should contain the following alternative names:- The Kubernetes service's internal clusterIP (the first address in the services CIDR, e.g.
10.96.0.1
if service subnet is10.96.0.0/12
) - Kubernetes DNS names, e.g.
kubernetes.default.svc.cluster.local
if--service-dns-domain
flag value iscluster.local
, plus default DNS nameskubernetes.default.svc
,kubernetes.default
,kubernetes
- The node-name
- The
--apiserver-advertise-address
- Additional alternative names specified by the user
- The Kubernetes service's internal clusterIP (the first address in the services CIDR, e.g.
-
A client certificate for the API server to connect to the kubelets securely, generated using
ca.crt
as the CA and saved intoapiserver-kubelet-client.crt
file with its private keyapiserver-kubelet-client.key
. This certificate should be in thesystem:masters
organization -
A private key for signing ServiceAccount Tokens saved into
sa.key
file along with its public keysa.pub
-
A certificate authority for the front proxy saved into
front-proxy-ca.crt
file with its keyfront-proxy-ca.key
-
A client certificate for the front proxy client, generated using
front-proxy-ca.crt
as the CA and saved intofront-proxy-client.crt
file with its private keyfront-proxy-client.key
Certificates are stored by default in /etc/kubernetes/pki
, but this directory is configurable
using the --cert-dir
flag.
Please note that:
- If a given certificate and private key pair both exist, and their content is evaluated to be compliant with the above specs, the existing files will
be used and the generation phase for the given certificate will be skipped. This means the user can, for example, copy an existing CA to
/etc/kubernetes/pki/ca.{crt,key}
, and then kubeadm will use those files for signing the rest of the certs. See also using custom certificates - For the CA, it is possible to provide the
ca.crt
file but not theca.key
file. If all other certificates and kubeconfig files are already in place, kubeadm recognizes this condition and activates the ExternalCA, which also implies thecsrsigner
controller in controller-manager won't be started - If kubeadm is running in external CA mode; all the certificates must be provided by the user, because kubeadm cannot generate them by itself
- In case kubeadm is executed in the
--dry-run
mode, certificate files are written in a temporary folder - Certificate generation can be invoked individually with the
kubeadm init phase certs all
command
Generate kubeconfig files for control plane components
Kubeadm generates kubeconfig files with identities for control plane components:
-
A kubeconfig file for the kubelet to use during TLS bootstrap -
/etc/kubernetes/bootstrap-kubelet.conf
. Inside this file, there is a bootstrap-token or embedded client certificates for authenticating this node with the cluster.This client certificate should:
- Be in the
system:nodes
organization, as required by the Node Authorization module - Have the Common Name (CN)
system:node:<hostname-lowercased>
- Be in the
-
A kubeconfig file for controller-manager,
/etc/kubernetes/controller-manager.conf
; inside this file is embedded a client certificate with controller-manager identity. This client certificate should have the CNsystem:kube-controller-manager
, as defined by default RBAC core components roles -
A kubeconfig file for scheduler,
/etc/kubernetes/scheduler.conf
; inside this file is embedded a client certificate with scheduler identity. This client certificate should have the CNsystem:kube-scheduler
, as defined by default RBAC core components roles
Additionally, a kubeconfig file for kubeadm as an administrative entity is generated and stored
in /etc/kubernetes/admin.conf
. This file includes a certificate with
Subject: O = kubeadm:cluster-admins, CN = kubernetes-admin
. kubeadm:cluster-admins
is a group managed by kubeadm. It is bound to the cluster-admin
ClusterRole during kubeadm init
,
by using the super-admin.conf
file, which does not require RBAC.
This admin.conf
file must remain on control plane nodes and should not be shared with additional users.
During kubeadm init
another kubeconfig file is generated and stored in /etc/kubernetes/super-admin.conf
.
This file includes a certificate with Subject: O = system:masters, CN = kubernetes-super-admin
.
system:masters
is a superuser group that bypasses RBAC and makes super-admin.conf
useful in case
of an emergency where a cluster is locked due to RBAC misconfiguration.
The super-admin.conf
file must be stored in a safe location and should not be shared with additional users.
See RBAC user facing role bindings for additional information on RBAC and built-in ClusterRoles and groups.
You can run kubeadm kubeconfig user
to generate kubeconfig files for additional users.
Caution:
The generated configuration files include an embedded authentication key, and you should treat them as confidential.Also note that:
ca.crt
certificate is embedded in all the kubeconfig files.- If a given kubeconfig file exists, and its content is evaluated as compliant with the above specs, the existing file will be used and the generation phase for the given kubeconfig will be skipped
- If kubeadm is running in ExternalCA mode, all the required kubeconfig must be provided by the user as well, because kubeadm cannot generate any of them by itself
- In case kubeadm is executed in the
--dry-run
mode, kubeconfig files are written in a temporary folder - Generation of kubeconfig files can be invoked individually with the
kubeadm init phase kubeconfig all
command
Generate static Pod manifests for control plane components
Kubeadm writes static Pod manifest files for control plane components to
/etc/kubernetes/manifests
. The kubelet watches this directory for Pods to be created on startup.
Static Pod manifests share a set of common properties:
-
All static Pods are deployed on
kube-system
namespace -
All static Pods get
tier:control-plane
andcomponent:{component-name}
labels -
All static Pods use the
system-node-critical
priority class -
hostNetwork: true
is set on all static Pods to allow control plane startup before a network is configured; as a consequence:- The
address
that the controller-manager and the scheduler use to refer to the API server is127.0.0.1
- If the etcd server is set up locally, the
etcd-server
address will be set to127.0.0.1:2379
- The
-
Leader election is enabled for both the controller-manager and the scheduler
-
Controller-manager and the scheduler will reference kubeconfig files with their respective, unique identities
-
All static Pods get any extra flags or patches that you specify, as described in passing custom arguments to control plane components
-
All static Pods get any extra Volumes specified by the user (Host path)
Please note that:
- All images will be pulled from registry.k8s.io by default. See using custom images for customizing the image repository
- In case kubeadm is executed in the
--dry-run
mode, static Pod files are written in a temporary folder - Static Pod manifest generation for control plane components can be invoked individually with
the
kubeadm init phase control-plane all
command
API server
The static Pod manifest for the API server is affected by the following parameters provided by the users:
- The
apiserver-advertise-address
andapiserver-bind-port
to bind to; if not provided, those values default to the IP address of the default network interface on the machine and port 6443 - The
service-cluster-ip-range
to use for services - If an external etcd server is specified, the
etcd-servers
address and related TLS settings (etcd-cafile
,etcd-certfile
,etcd-keyfile
); if an external etcd server is not provided, a local etcd will be used (via host network) - If a cloud provider is specified, the corresponding
--cloud-provider
parameter is configured together with the--cloud-config
path if such file exists (this is experimental, alpha and will be removed in a future version)
Other API server flags that are set unconditionally are:
-
--insecure-port=0
to avoid insecure connections to the api server -
--enable-bootstrap-token-auth=true
to enable theBootstrapTokenAuthenticator
authentication module. See TLS Bootstrapping for more details -
--allow-privileged
totrue
(required e.g. by kube proxy) -
--requestheader-client-ca-file
tofront-proxy-ca.crt
-
--enable-admission-plugins
to:NamespaceLifecycle
e.g. to avoid deletion of system reserved namespacesLimitRanger
andResourceQuota
to enforce limits on namespacesServiceAccount
to enforce service account automationPersistentVolumeLabel
attaches region or zone labels to PersistentVolumes as defined by the cloud provider (This admission controller is deprecated and will be removed in a future version. It is not deployed by kubeadm by default with v1.9 onwards when not explicitly opting into usinggce
oraws
as cloud providers)DefaultStorageClass
to enforce default storage class onPersistentVolumeClaim
objectsDefaultTolerationSeconds
NodeRestriction
to limit what a kubelet can modify (e.g. only pods on this node)
-
--kubelet-preferred-address-types
toInternalIP,ExternalIP,Hostname;
this makeskubectl logs
and other API server-kubelet communication work in environments where the hostnames of the nodes aren't resolvable -
Flags for using certificates generated in previous steps:
--client-ca-file
toca.crt
--tls-cert-file
toapiserver.crt
--tls-private-key-file
toapiserver.key
--kubelet-client-certificate
toapiserver-kubelet-client.crt
--kubelet-client-key
toapiserver-kubelet-client.key
--service-account-key-file
tosa.pub
--requestheader-client-ca-file
tofront-proxy-ca.crt
--proxy-client-cert-file
tofront-proxy-client.crt
--proxy-client-key-file
tofront-proxy-client.key
-
Other flags for securing the front proxy (API Aggregation) communications:
--requestheader-username-headers=X-Remote-User
--requestheader-group-headers=X-Remote-Group
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-allowed-names=front-proxy-client
Controller manager
The static Pod manifest for the controller manager is affected by following parameters provided by the users:
-
If kubeadm is invoked specifying a
--pod-network-cidr
, the subnet manager feature required for some CNI network plugins is enabled by setting:--allocate-node-cidrs=true
--cluster-cidr
and--node-cidr-mask-size
flags according to the given CIDR
Other flags that are set unconditionally are:
-
--controllers
enabling all the default controllers plusBootstrapSigner
andTokenCleaner
controllers for TLS bootstrap. See TLS Bootstrapping for more details. -
--use-service-account-credentials
totrue
-
Flags for using certificates generated in previous steps:
--root-ca-file
toca.crt
--cluster-signing-cert-file
toca.crt
, if External CA mode is disabled, otherwise to""
--cluster-signing-key-file
toca.key
, if External CA mode is disabled, otherwise to""
--service-account-private-key-file
tosa.key
Scheduler
The static Pod manifest for the scheduler is not affected by parameters provided by the user.
Generate static Pod manifest for local etcd
If you specified an external etcd, this step will be skipped, otherwise kubeadm generates a static Pod manifest file for creating a local etcd instance running in a Pod with following attributes:
- listen on
localhost:2379
and useHostNetwork=true
- make a
hostPath
mount out from thedataDir
to the host's filesystem - Any extra flags specified by the user
Please note that:
- The etcd container image will be pulled from
registry.gcr.io
by default. See using custom images for customizing the image repository. - If you run kubeadm in
--dry-run
mode, the etcd static Pod manifest is written into a temporary folder. - You can directly invoke static Pod manifest generation for local etcd, using the
kubeadm init phase etcd local
command.
Wait for the control plane to come up
On control plane nodes, kubeadm waits up to 4 minutes for the control plane components
and the kubelet to be available. It does that by performing a health check on the respective
component /healthz
or /livez
endpoints.
After the control plane is up, kubeadm completes the tasks described in following paragraphs.
Save the kubeadm ClusterConfiguration in a ConfigMap for later reference
kubeadm saves the configuration passed to kubeadm init
in a ConfigMap named kubeadm-config
under kube-system
namespace.
This will ensure that kubeadm actions executed in future (e.g kubeadm upgrade
) will be able to
determine the actual/current cluster state and make new decisions based on that data.
Please note that:
- Before saving the ClusterConfiguration, sensitive information like the token is stripped from the configuration
- Upload of control plane node configuration can be invoked individually with the command
kubeadm init phase upload-config
.
Mark the node as control-plane
As soon as the control plane is available, kubeadm executes the following actions:
- Labels the node as control-plane with
node-role.kubernetes.io/control-plane=""
- Taints the node with
node-role.kubernetes.io/control-plane:NoSchedule
Please note that the phase to mark the control-plane phase can be invoked
individually with the kubeadm init phase mark-control-plane
command.
Configure TLS-Bootstrapping for node joining
Kubeadm uses Authenticating with Bootstrap Tokens for joining new nodes to an existing cluster; for more details see also design proposal.
kubeadm init
ensures that everything is properly configured for this process, and this includes
following steps as well as setting API server and controller flags as already described in
previous paragraphs.
Note:
TLS bootstrapping for nodes can be configured with the commandkubeadm init phase bootstrap-token
,
executing all the configuration steps described in following paragraphs;
alternatively, each step can be invoked individually.Create a bootstrap token
kubeadm init
creates a first bootstrap token, either generated automatically or provided by the
user with the --token
flag; as documented in bootstrap token specification, token should be
saved as a secret with name bootstrap-token-<token-id>
under kube-system
namespace.
Please note that:
- The default token created by
kubeadm init
will be used to validate temporary user during TLS bootstrap process; those users will be member ofsystem:bootstrappers:kubeadm:default-node-token
group - The token has a limited validity, default 24 hours (the interval may be changed with the
—token-ttl
flag) - Additional tokens can be created with the
kubeadm token
command, that provide other useful functions for token management as well.
Allow joining nodes to call CSR API
Kubeadm ensures that users in system:bootstrappers:kubeadm:default-node-token
group are able to
access the certificate signing API.
This is implemented by creating a ClusterRoleBinding named kubeadm:kubelet-bootstrap
between the
group above and the default RBAC role system:node-bootstrapper
.
Set up auto approval for new bootstrap tokens
Kubeadm ensures that the Bootstrap Token will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-bootstrap
between the system:bootstrappers:kubeadm:default-node-token
group and the default role
system:certificates.k8s.io:certificatesigningrequests:nodeclient
.
The role system:certificates.k8s.io:certificatesigningrequests:nodeclient
should be created as
well, granting POST permission to
/apis/certificates.k8s.io/certificatesigningrequests/nodeclient
.
Set up nodes certificate rotation with auto approval
Kubeadm ensures that certificate rotation is enabled for nodes, and that a new certificate request for nodes will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named
kubeadm:node-autoapprove-certificate-rotation
between the system:nodes
group and the default
role system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
.
Create the public cluster-info ConfigMap
This phase creates the cluster-info
ConfigMap in the kube-public
namespace.
Additionally, it creates a Role and a RoleBinding granting access to the ConfigMap for
unauthenticated users (i.e. users in RBAC group system:unauthenticated
).
Note:
The access to thecluster-info
ConfigMap is not rate-limited. This may or may not be a
problem if you expose your cluster's API server to the internet; worst-case scenario here is a
DoS attack where an attacker uses all the in-flight requests the kube-apiserver can handle to
serve the cluster-info
ConfigMap.Install addons
Kubeadm installs the internal DNS server and the kube-proxy addon components via the API server.
Note:
This phase can be invoked individually with the commandkubeadm init phase addon all
.proxy
A ServiceAccount for kube-proxy
is created in the kube-system
namespace; then kube-proxy is
deployed as a DaemonSet:
- The credentials (
ca.crt
andtoken
) to the control plane come from the ServiceAccount - The location (URL) of the API server comes from a ConfigMap
- The
kube-proxy
ServiceAccount is bound to the privileges in thesystem:node-proxier
ClusterRole
DNS
-
The CoreDNS service is named
kube-dns
for compatibility reasons with the legacykube-dns
addon. -
A ServiceAccount for CoreDNS is created in the
kube-system
namespace. -
The
coredns
ServiceAccount is bound to the privileges in thesystem:coredns
ClusterRole
In Kubernetes version 1.21, support for using kube-dns
with kubeadm was removed.
You can use CoreDNS with kubeadm even when the related Service is named kube-dns
.
kubeadm join phases internal design
Similarly to kubeadm init
, also kubeadm join
internal workflow consists of a sequence of
atomic work tasks to perform.
This is split into discovery (having the Node trust the Kubernetes API Server) and TLS bootstrap (having the Kubernetes API Server trust the Node).
see Authenticating with Bootstrap Tokens or the corresponding design proposal.
Preflight checks
kubeadm
executes a set of preflight checks before starting the join, with the aim to verify
preconditions and avoid common cluster startup problems.
Also note that:
kubeadm join
preflight checks are basically a subset ofkubeadm init
preflight checks- If you are joining a Windows node, Linux specific controls are skipped.
- In any case the user can skip specific preflight checks (or eventually all preflight checks)
with the
--ignore-preflight-errors
option.
Discovery cluster-info
There are 2 main schemes for discovery. The first is to use a shared token along with the IP address of the API server. The second is to provide a file (that is a subset of the standard kubeconfig file).
Shared token discovery
If kubeadm join
is invoked with --discovery-token
, token discovery is used; in this case the
node basically retrieves the cluster CA certificates from the cluster-info
ConfigMap in the
kube-public
namespace.
In order to prevent "man in the middle" attacks, several steps are taken:
-
First, the CA certificate is retrieved via insecure connection (this is possible because
kubeadm init
is granted access tocluster-info
users forsystem:unauthenticated
) -
Then the CA certificate goes through following validation steps:
- Basic validation: using the token ID against a JWT signature
- Pub key validation: using provided
--discovery-token-ca-cert-hash
. This value is available in the output ofkubeadm init
or can be calculated using standard tools (the hash is calculated over the bytes of the Subject Public Key Info (SPKI) object as in RFC7469). The--discovery-token-ca-cert-hash flag
may be repeated multiple times to allow more than one public key. - As an additional validation, the CA certificate is retrieved via secure connection and then compared with the CA retrieved initially
Note:
You can skip CA validation by passing the--discovery-token-unsafe-skip-ca-verification
flag on the command line.
This weakens the kubeadm security model since others can potentially impersonate the Kubernetes API server.File/https discovery
If kubeadm join
is invoked with --discovery-file
, file discovery is used; this file can be a
local file or downloaded via an HTTPS URL; in case of HTTPS, the host installed CA bundle is used
to verify the connection.
With file discovery, the cluster CA certificate is provided into the file itself; in fact, the
discovery file is a kubeconfig file with only server
and certificate-authority-data
attributes
set, as described in the kubeadm join
reference doc; when the connection with the cluster is established, kubeadm tries to access the
cluster-info
ConfigMap, and if available, uses it.
TLS Bootstrap
Once the cluster info is known, the file bootstrap-kubelet.conf
is written, thus allowing
kubelet to do TLS Bootstrapping.
The TLS bootstrap mechanism uses the shared token to temporarily authenticate with the Kubernetes API server to submit a certificate signing request (CSR) for a locally created key pair.
The request is then automatically approved and the operation completes saving ca.crt
file and
kubelet.conf
file to be used by the kubelet for joining the cluster, while bootstrap-kubelet.conf
is deleted.
Note:
- The temporary authentication is validated against the token saved during the
kubeadm init
process (or with additional tokens created withkubeadm token
command) - The temporary authentication resolves to a user member of
system:bootstrappers:kubeadm:default-node-token
group which was granted access to the CSR api during thekubeadm init
process - The automatic CSR approval is managed by the csrapprover controller, according to
the configuration present in the
kubeadm init
process
kubeadm upgrade workflow internal design
kubeadm upgrade
has sub-commands for handling the upgrade of the Kubernets cluster created by kubeadm.
You must run kubeadm upgrade apply
on a control plane node (you can choose which one);
this starts the upgrade process. You then run kubeadm upgrade node
on all remaining
nodes (both worker nodes and control plane nodes).
Both kubeadm upgrade apply
and kubeadm upgrade node
have a phase
subcommand which provides access
to the internal phases of the upgrade process.
See kubeadm upgrade phase
for more details.
Additional utility upgrade commands are kubeadm upgrade plan
and kubeadm upgrade diff
.
All upgrade sub-commands support passing a configuration file.
kubeadm upgrade plan
You can optionally run kubeadm upgrade plan
before you run kubeadm upgrade apply
.
The plan
subcommand checks which versions are available to upgrade
to and validates whether your current cluster is upgradeable.
kubeadm upgrade diff
This shows what differences would be applied to existing static pod manifests for control plane nodes.
A more verbose way to do the same thing is running kubeadm upgrade apply --dry-run
or
kubeadm upgrade node --dry-run
.
kubeadm upgrade apply
kubeadm upgrade apply
prepares the cluster for the upgrade of all nodes, and also
upgrades the control plane node where it's run. The steps it performs are:
- Runs preflight checks similarly to
kubeadm init
andkubeadm join
, ensuring container images are downloaded and the cluster is in a good state to be upgraded. - Upgrades the control plane manifest files on disk in
/etc/kubernetes/manifests
and waits for the kubelet to restart the components if the files have changed. - Uploads the updated kubeadm and kubelet configurations to the cluster in the
kubeadm-config
and thekubelet-config
ConfigMaps (both in thekube-system
namespace). - Writes updated kubelet configuration for this node in
/var/lib/kubelet/config.yaml
. - Configures bootstrap token and the
cluster-info
ConfigMap for RBAC rules. This is the same as in thekubeadm init
stage and ensures that the cluster continues to support nodes joining with bootstrap tokens. - Upgrades the kube-proxy and CoreDNS addons conditionally if all existing kube-apiservers in the cluster have already been upgraded to the target version.
- Performs any post-upgrade tasks, such as, cleaning up deprecated features which are release specific.
kubeadm upgrade node
kubeadm upgrade node
upgrades a single control plane or worker node after the cluster upgrade has
started (by running kubeadm upgrade apply
). The command detects if the node is a control plane node by checking
if the file /etc/kubernetes/manifests/kube-apiserver.yaml
exists. On finding that file, the kubeadm tool
infers that there is a running kube-apiserver Pod on this node.
- Runs preflight checks similarly to
kubeadm upgrade apply
. - For control plane nodes, upgrades the control plane manifest files on disk in
/etc/kubernetes/manifests
and waits for the kubelet to restart the components if the files have changed. - Writes the updated kubelet configuration for this node in
/var/lib/kubelet/config.yaml
. - (For control plane nodes) upgrades the kube-proxy and CoreDNS addons conditionally, provided that all existing API servers in the cluster have already been upgraded to the target version.
- Performs any post-upgrade tasks, such as cleaning up deprecated features which are release specific.
kubeadm reset workflow internal design
You can use the kubeadm reset
subcommand on a node where kubeadm commands previously executed.
This subcommand performs a best-effort cleanup of the node.
If certain actions fail you must intervene and perform manual cleanup.
The command supports phases.
See kubeadm reset phase
for more details.
The command supports a configuration file.
Additionally:
- IPVS, iptables and nftables rules are not cleaned up.
- CNI (network plugin) configuration is not cleaned up.
.kube/
in the user's home directory is not cleaned up.
The command has the following stages:
- Runs preflight checks on the node to determine if its healthy.
- For control plane nodes, removes any local etcd member data.
- Stops the kubelet.
- Stops running containers.
- Unmounts any mounted directories in
/var/lib/kubelet
. - Deletes any files and directories managed by kubeadm in
/var/lib/kubelet
and/etc/kubernetes
.