This is part 2 of a series about Suse’s Kubernetes offerings. See part 1 for RKE.
RKE2 (Rancher Kubernetes Engine 2 aka RKE for Government)
What is it?
- RKE2 is RKE “for government” and it’s based on K3s
- It’s a Kubernetes “manager”
- Single binary, contains containerd, kubelet functionality
- Supports different Kubernetes versions (1.25-1.29 as of March 2024)
- Uses etcd as backend
- Docker is not supported as a container runtime but it can run in parallel because it uses a bundled Containerd and a different socket
- SDN options: Canal (default), Cilium, Calico, Multus
- Contains a HelmChart controller that can be used with custom resources
- Add-ons include Nginx Ingress controller and can be any Helm chart
- RKE2 needs to be installed on every cluster node separately
- Hardened security, complicance for US government, FIPS 140-2, airgapped setup is possible because of mirror registry feature
Configuration
- Supported operating system: several distributions of Suse, Redhat, Ubuntu, Oracle, Amazon… older k8s version might not work on newer OSes
- Windows Worker nodes are supported
- Download rke2 install script on every node:
$ curl -sfL https://get.rke2.io --output install.sh
- or install with defaults on single node:
$ curl -sfL https://get.rke2.io | sh -
- Configuration is done either with environment variables, with command line options or in a config file /etc/rancher/rke2/config.yaml
Complete list: https://docs.rke2.io/install/configuration - Consists of “server” and “agent” nodes
Server node: any control plane component and/or etcd node (and worker if configured)
Agent node: worker node - Important environment variables:
INSTALL_RKE2_TYPE="agent"
or"server"
INSTALL_RKE2_CHANNEL
version from specific channel e.g. “stable” see https://update.rke2.io/v1-release/channelsINSTALL_RKE2_VERSION
for a specific version, e.g."v1.28.2-rke2r1"
INSTALL_RKE2_EXEC
adds flags for functions to be or not to be installed
Installation
Install a specific RKE2 version on a “server” node (control-plane, etcd) as root
# curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -
Install a specific RKE2 version on an “agent” node (worker)
# curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -
Use a downloaded copy of the installer
INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN install.sh
After successful installation no application is running yet. Two systemd services are created, regardless of the installation type. The services are neither enabled nor started. The binaries are placed into /usr/local/bin
or into /opt/rke2/bin
depending on the OS type. $PATH usually does not contain these directories. Make sure you extend your search path appropriately. Three binaries and scripts are available after installation:
rke2
→ THE RKE2 binaryrke2-killall.sh
→ A script to kill all RKE2 related processes including the container runtimerke2-uninstall.sh
→ A script to completely wipe RKE2 from the node. This includes downloaded images
Additional binaries can be found in /var/lib/rancher/rke2/bin
:
kubectl
→ kubectl matching the installed k8s versioncrictl
→ CRI-O CLI to interact with the CRIctr
→ ContainerD CLI
crictl
and ctr
are not working correctly by default because the configuration does not set the correct path to the containerd socket. This is a known issue. Until it is solved, you need to set three environment variables:
export CONTAINER_RUNTIME_ENDPOINT=/run/k3s/containerd/containerd.sock
export CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock
export CONTAINERD_NAMESPACE=k8s.io
# This is not a typo! The containerd socket is really in /run/k3s.
The usual workflow is now to generate at least a config.yaml for every node type in /etc/rancher/rke2
or multiple files in /etc/rancher/rke2/config.d/*
, add additional files like containerd configurations e.g. /etc/rancher/rke2/registries.yaml
or static manifests at /var/lib/rancher/rke2/server/manifests
. The syntax of the config.yaml can be found here: https://docs.rke2.io/reference/server_config/ and https://docs.rke2.io/install/install_options/linux_agent_config/. The configuration needs to be done on EVERY node, server AND agent, and can (and should) be automated easily.
For minimum configuration config.yaml can be omitted completely. This will create a single node cluster with Canal CNI and all options set to defaults. For a multi-node cluster your config.yaml has to differ between the first server/master node, the other control plane nodes and the agent/worker nodes! Make sure to configure the cluster add-ons here as well like Nginx Ingress, CNIs etc… or disable add-ons you don’t need!
First master |
node-name: |
Other masters |
server: |
Workers |
server: |
The cluster token can be any string. If omitted, one will be generated automatically on the first master and put into /var/lib/rancher/rke2/server/token
. This token needs to be used on the other nodes to join the cluster. Any running master can be used in the server directive, e.g. when the first master was destroyed and recreated.
To start any node, execute (depending on the node type):
# systemctl enable --now rke2-server.service
or
# systemctl enable --now rke2-agent.service
Managing a cluster
The kubeconfig file can be found on all masters as /etc/rancher/rke2/rke2.yaml
and is only accessible as root. This can be changed with the option --write-kubeconfig-mode 644
. To use it on any master, export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
Reminder: kubectl
is available in /var/lib/rancher/rke2/bin
!
After the cluster has been started for the first time, some configuration items cannot be changed anymore, e.g. cluster, node and service CIDRs. Other items can be changed by editing config.yaml and restarting the running systemd service on every node:
# systemctl restart rke2-server.service # on server nodes
or
# systemctl restart rke2-agent.service # on agent nodes
systemctl stop
will not really work to stop cluster services because it keeps all pods running. The rke2-killall.sh
and rke2-uninstall.sh
scripts in /usr/local/bin
can be used to either kill any RKE2 service or remove RKE2 completely from a node. Make sure to backup the files from /etc/rancher/rke2
or have some automation to recreate them.
A manual or automatic cluster backup can be configured. Targets can be the local filesystem of the masters or any remote S3 bucket.
What do we get?
RKE2 consists only of a static systemd service comprising containerd and kubelet:
pstree 917
rke2─┬─containerd───141*[{containerd}]
├─kubelet───25*[{kubelet}]
└─23*[{rke2}]
System pods (control plane uses static pods):
kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cloud-controller-manager-localhorst 1/1 Running 0 12m
kube-system etcd-localhorst 1/1 Running 0 12m
kube-system helm-install-rke2-canal-bpsq7 0/1 Completed 0 12m
kube-system helm-install-rke2-coredns-4psk4 0/1 Completed 0 12m
kube-system helm-install-rke2-ingress-nginx-lq4zc 0/1 Completed 0 12m
kube-system helm-install-rke2-metrics-server-fxff7 0/1 Completed 0 12m
kube-system kube-apiserver-localhorst 1/1 Running 0 12m
kube-system kube-controller-manager-localhorst 1/1 Running 0 12m
kube-system kube-proxy-localhorst 1/1 Running 0 12m
kube-system kube-scheduler-localhorst 1/1 Running 0 12m
kube-system rke2-canal-5lx5g 2/2 Running 0 11m
kube-system rke2-coredns-rke2-coredns-58fd75f64b-m2kd4 1/1 Running 0 11m
kube-system rke2-coredns-rke2-coredns-autoscaler-768bfc5985-2zmsv 1/1 Running 0 11m
kube-system rke2-ingress-nginx-controller-8h6dj 1/1 Running 0 8m23s
kube-system rke2-metrics-server-67697454f8-pzlx7 1/1 Running 0 9m5s
No CSI storage class and no load balancer are installed by default, the local path provisioner can be used for single node cluster: https://github.com/rancher/local-path-provisioner
The cluster is using these directories and filesystems on the nodes:
/var/lib/rancher/rke2/agent/containerd
→ images, rootfs
/var/log/pods
→ pod logs
/var/lib/kubelet
→ volumes, ephemeral storage
/var/lib/rancher/rke2/server/db
→ etcd db files
/var/lib/rancher/rke2/server/db/snapshots
→ etcd backups
Debugging
# journalctl -u rke2-server
or
# journalctl -u rke2-agent
# crictl ps
# crictl pods
# ctr container ls
# kubectl logs
Lifecycle
Updates are installed the same way as RKE2 is installed:
Upgrade to latest RKE2 and k8s version:
# curl -sfL https://get.rke2.io | sh -
Upgrade to a specific channel:
# curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL=latest sh -
Upgrade to a specific k8s version:
# curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -
followed by a systemd unit restart on all nodes:
# systemctl restart rke2-server.service # on server nodes
or
# systemctl restart rke2-agent.service # on agent nodes
Make sure to drain nodes before this action and perform this only on a limited number of nodes in parallel to avoid a noticeable degradation in performance and availability!
A cluster downgrade is not supported. You need to re-install the cluster from a backup!
Suse is working on a migration method from RKE to RKE2. Currently only an early version is available at https://github.com/rancher/migration-agent/releases/latest. Use at own risk!
K3s
TLDR; RKE2 is based on the Kubernetes distribution K3s, so it is no surprise that it’s mostly the same. But there are some subtle differences.
What is it?
- Feature complete k8s but out-of-tree and alpha resources, storage drivers, etc… are removed
- HA control plane and multi-node clusters are still possible
- Resource footprint reduced for small edge devices
- Can have different backend databases → Sqlite (default), etcd3, MySQL, Postgresql
s/rke2/k3s/g
→ all path names, variables and systemd services were renamed- Bundled with Flannel (CNI), Traefik (Ingress), Klipper-lb (Service LB), Canal and Calico
The differences
- Installation:
curl -sfL https://get.k3s.io | sh -
- All control plane components are located in one single systemd unit
k3s.service
. They are not visible in anykubectl
output nor in any running container list. It’s just one process running on the node:k3s
- On worker nodes the systemd unit name is
k3s-agent.service
- After executing the K3s installer, K3s will start instantly. This can be prevented by adding the options
INSTALL_K3S_SKIP_START=true
andINSTALL_K3S_SKIP_ENABLE=true
- The Local Path storage provisioner is included and installed by default.
- There is only one k8s version included because it’s one binary! Upgrade → new
k3s
binary - Installs
kubectl
,crictl
binaries in/usr/local/bin
, there is noctr
binary available - There is no configuration backup included. Use DB backup procedures or (simply) copy the Sqlite DB to a safe place
Summary
RKE2 and K3s are the better way of running Suse’s Kubernetes. They are way more flexible than RKE, come with modern components like Cilium or Multus, are independent from Docker and are easier to handle as Infrastructure as Code (IaC). The development is in sync with upstream Kubernetes, releases are available shortly after official upstream. All images are hardened to comply with the CIS benchmark. There is a Helm controller included making a local Helm CLI installation obsolete. Airgapped setups are supported by default. If you need to run with low CPU and memory resources, choose K3s, otherwise RKE2 is the right thing for you.