An Introduction and Comparison of RKE, RKE2 and K3s from an Ops Guy’s Perspective, Part 2: RKE2 and K3s

This is part 2 of a series about Suse’s Kubernetes offerings. See part 1 for RKE.

RKE2 logo

RKE2 (Rancher Kubernetes Engine 2 aka RKE for Government)

What is it?

  • RKE2 is RKE “for government” and it’s based on K3s
  • It’s a Kubernetes “manager”
  • Single binary, contains containerd, kubelet functionality
  • Supports different Kubernetes versions (1.25-1.29 as of March 2024)
  • Uses etcd as backend
  • Docker is not supported as a container runtime but it can run in parallel because it uses a bundled Containerd and a different socket
  • SDN options: Canal (default), Cilium, Calico, Multus
  • Contains a HelmChart controller that can be used with custom resources
  • Add-ons include Nginx Ingress controller and can be any Helm chart
  • RKE2 needs to be installed on every cluster node separately
  • Hardened security, complicance for US government, FIPS 140-2, airgapped setup is possible because of mirror registry feature

Configuration

  • Supported operating system: several distributions of Suse, Redhat, Ubuntu, Oracle, Amazon… older k8s version might not work on newer OSes
  • Windows Worker nodes are supported
  • Download rke2 install script on every node:
    $ curl -sfL https://get.rke2.io --output install.sh
  • or install with defaults on single node:
    $ curl -sfL https://get.rke2.io | sh -
  • Configuration is done either with environment variables, with command line options or in a config file /etc/rancher/rke2/config.yaml
    Complete list: https://docs.rke2.io/install/configuration
  • Consists of “server” and “agent” nodes
    Server node: any control plane component and/or etcd node (and worker if configured)
    Agent node: worker node
  • Important environment variables:
    INSTALL_RKE2_TYPE="agent" or "server"
    INSTALL_RKE2_CHANNEL version from specific channel e.g. “stable” see https://update.rke2.io/v1-release/channels
    INSTALL_RKE2_VERSION for a specific version, e.g. "v1.28.2-rke2r1"
  • INSTALL_RKE2_EXEC adds flags for functions to be or not to be installed

Installation

Install a specific RKE2 version on a “server” node (control-plane, etcd) as root

# curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -

Install a specific RKE2 version on an “agent” node (worker)

# curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -

Use a downloaded copy of the installer

INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN install.sh

After successful installation no application is running yet. Two systemd services are created, regardless of the installation type. The services are neither enabled nor started. The binaries are placed into /usr/local/bin or into /opt/rke2/bin depending on the OS type. $PATH usually does not contain these directories. Make sure you extend your search path appropriately. Three binaries and scripts are available after installation:

  • rke2 → THE RKE2 binary
  • rke2-killall.sh → A script to kill all RKE2 related processes including the container runtime
  • rke2-uninstall.sh → A script to completely wipe RKE2 from the node. This includes downloaded images

Additional binaries can be found in /var/lib/rancher/rke2/bin:

  • kubectl → kubectl matching the installed k8s version
  • crictl → CRI-O CLI to interact with the CRI
  • ctr → ContainerD CLI

crictl and ctr are not working correctly by default because the configuration does not set the correct path to the containerd socket. This is a known issue. Until it is solved, you need to set three environment variables:

export CONTAINER_RUNTIME_ENDPOINT=/run/k3s/containerd/containerd.sock
export CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock
export CONTAINERD_NAMESPACE=k8s.io

# This is not a typo! The containerd socket is really in /run/k3s.

The usual workflow is now to generate at least a config.yaml for every node type in /etc/rancher/rke2 or multiple files in /etc/rancher/rke2/config.d/*, add additional files like containerd configurations e.g. /etc/rancher/rke2/registries.yaml or static manifests at /var/lib/rancher/rke2/server/manifests. The syntax of the config.yaml can be found here: https://docs.rke2.io/reference/server_config/ and https://docs.rke2.io/install/install_options/linux_agent_config/. The configuration needs to be done on EVERY node, server AND agent, and can (and should) be automated easily.

For minimum configuration config.yaml can be omitted completely. This will create a single node cluster with Canal CNI and all options set to defaults. For a multi-node cluster your config.yaml has to differ between the first server/master node, the other control plane nodes and the agent/worker nodes! Make sure to configure the cluster add-ons here as well like Nginx Ingress, CNIs etc… or disable add-ons you don’t need!

First master
node-name:
  - "master01"
node-taint:
  - "CriticalAddonsOnly=true:NoExecute"
tls-san:
  - loadbalancer.example.org
Other masters
server:
  - "https://master01:9345"
token:
  - "K105bcadf5478689bce.."
node-name:
  - "master0x"
node-taint:
  - "CriticalAddonsOnly=true:NoExecute"
tls-san:
  - loadbalancer.example.org
Workers
server:
  - "https://master01:9345"
token:
  - "K105bcadf5478689bce.."
node-name:
  - "worker0x"
node-label:
  - "node=worker"
config.yaml examples for a mult-node cluster

The cluster token can be any string. If omitted, one will be generated automatically on the first master and put into /var/lib/rancher/rke2/server/token. This token needs to be used on the other nodes to join the cluster. Any running master can be used in the server directive, e.g. when the first master was destroyed and recreated.

To start any node, execute (depending on the node type):

# systemctl enable --now rke2-server.service
or
# systemctl enable --now rke2-agent.service

Managing a cluster

The kubeconfig file can be found on all masters as /etc/rancher/rke2/rke2.yaml and is only accessible as root. This can be changed with the option --write-kubeconfig-mode 644. To use it on any master, export KUBECONFIG=/etc/rancher/rke2/rke2.yaml Reminder: kubectl is available in /var/lib/rancher/rke2/bin!

After the cluster has been started for the first time, some configuration items cannot be changed anymore, e.g. cluster, node and service CIDRs. Other items can be changed by editing config.yaml and restarting the running systemd service on every node:

# systemctl restart rke2-server.service # on server nodes
or
# systemctl restart rke2-agent.service  # on agent nodes

systemctl stop will not really work to stop cluster services because it keeps all pods running. The rke2-killall.sh and rke2-uninstall.sh scripts in /usr/local/bin can be used to either kill any RKE2 service or remove RKE2 completely from a node. Make sure to backup the files from /etc/rancher/rke2 or have some automation to recreate them.

A manual or automatic cluster backup can be configured. Targets can be the local filesystem of the masters or any remote S3 bucket.

What do we get?

RKE2 consists only of a static systemd service comprising containerd and kubelet:

pstree 917
rke2─┬─containerd───141*[{containerd}]
     ├─kubelet───25*[{kubelet}]
     └─23*[{rke2}]

System pods (control plane uses static pods):

kubectl get pod -A
NAMESPACE     NAME                                                    READY   STATUS      RESTARTS   AGE
kube-system   cloud-controller-manager-localhorst                     1/1     Running     0          12m
kube-system   etcd-localhorst                                         1/1     Running     0          12m
kube-system   helm-install-rke2-canal-bpsq7                           0/1     Completed   0          12m
kube-system   helm-install-rke2-coredns-4psk4                         0/1     Completed   0          12m
kube-system   helm-install-rke2-ingress-nginx-lq4zc                   0/1     Completed   0          12m
kube-system   helm-install-rke2-metrics-server-fxff7                  0/1     Completed   0          12m
kube-system   kube-apiserver-localhorst                               1/1     Running     0          12m
kube-system   kube-controller-manager-localhorst                      1/1     Running     0          12m
kube-system   kube-proxy-localhorst                                   1/1     Running     0          12m
kube-system   kube-scheduler-localhorst                               1/1     Running     0          12m
kube-system   rke2-canal-5lx5g                                        2/2     Running     0          11m
kube-system   rke2-coredns-rke2-coredns-58fd75f64b-m2kd4              1/1     Running     0          11m
kube-system   rke2-coredns-rke2-coredns-autoscaler-768bfc5985-2zmsv   1/1     Running     0          11m
kube-system   rke2-ingress-nginx-controller-8h6dj                     1/1     Running     0          8m23s
kube-system   rke2-metrics-server-67697454f8-pzlx7                    1/1     Running     0          9m5s

No CSI storage class and no load balancer are installed by default, the local path provisioner can be used for single node cluster: https://github.com/rancher/local-path-provisioner

The cluster is using these directories and filesystems on the nodes:

/var/lib/rancher/rke2/agent/containerd → images, rootfs

/var/log/pods → pod logs

/var/lib/kubelet → volumes, ephemeral storage

/var/lib/rancher/rke2/server/db → etcd db files

/var/lib/rancher/rke2/server/db/snapshots → etcd backups

Debugging

# journalctl -u rke2-server
or
# journalctl -u rke2-agent

# crictl ps
# crictl pods
# ctr container ls

# kubectl logs

Lifecycle

Updates are installed the same way as RKE2 is installed:

Upgrade to latest RKE2 and k8s version:

# curl -sfL https://get.rke2.io | sh -

Upgrade to a specific channel:

# curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL=latest sh -

Upgrade to a specific k8s version:

# curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION=vX.Y.Z+rke2rN sh -

followed by a systemd unit restart on all nodes:

# systemctl restart rke2-server.service	# on server nodes
or
# systemctl restart rke2-agent.service	# on agent nodes

Make sure to drain nodes before this action and perform this only on a limited number of nodes in parallel to avoid a noticeable degradation in performance and availability!

A cluster downgrade is not supported. You need to re-install the cluster from a backup!

Suse is working on a migration method from RKE to RKE2. Currently only an early version is available at https://github.com/rancher/migration-agent/releases/latest. Use at own risk!

k3s logo

K3s

TLDR; RKE2 is based on the Kubernetes distribution K3s, so it is no surprise that it’s mostly the same. But there are some subtle differences.

What is it?

  • Feature complete k8s but out-of-tree and alpha resources, storage drivers, etc… are removed
  • HA control plane and multi-node clusters are still possible
  • Resource footprint reduced for small edge devices
  • Can have different backend databases → Sqlite (default), etcd3, MySQL, Postgresql
  • s/rke2/k3s/g → all path names, variables and systemd services were renamed
  • Bundled with Flannel (CNI), Traefik (Ingress), Klipper-lb (Service LB), Canal and Calico

The differences

  • Installation: curl -sfL https://get.k3s.io | sh -
  • All control plane components are located in one single systemd unit k3s.service. They are not visible in any kubectl output nor in any running container list. It’s just one process running on the node: k3s
  • On worker nodes the systemd unit name is k3s-agent.service
  • After executing the K3s installer, K3s will start instantly. This can be prevented by adding the options INSTALL_K3S_SKIP_START=true andINSTALL_K3S_SKIP_ENABLE=true
  • The Local Path storage provisioner is included and installed by default.
  • There is only one k8s version included because it’s one binary! Upgrade → new k3s binary
  • Installs kubectl, crictl binaries in /usr/local/bin, there is no ctr binary available
  • There is no configuration backup included. Use DB backup procedures or (simply) copy the Sqlite DB to a safe place

Summary

RKE2 and K3s are the better way of running Suse’s Kubernetes. They are way more flexible than RKE, come with modern components like Cilium or Multus, are independent from Docker and are easier to handle as Infrastructure as Code (IaC). The development is in sync with upstream Kubernetes, releases are available shortly after official upstream. All images are hardened to comply with the CIS benchmark. There is a Helm controller included making a local Helm CLI installation obsolete. Airgapped setups are supported by default. If you need to run with low CPU and memory resources, choose K3s, otherwise RKE2 is the right thing for you.

Kommentar verfassen

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

Nach oben scrollen
WordPress Cookie Hinweis von Real Cookie Banner