Make sure that SWAP is disabled and not running. If swap has not been removed from the disk proposal on the install, do the following:
$ systemctl disable swap.target
$ swapoff -a
Installing Containerd
# Configure persistent loading of modules sudo tee /etc/modules-load.d/containerd.conf <<EOF overlay br_netfilter EOF # Load at runtime sudo modprobe overlay sudo modprobe br_netfilter # Ensure sysctl params are set sudo tee /etc/sysctl.d/kubernetes.conf<<EOF net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF # Reload configs sudo sysctl --system
# Install required packages
sudo apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates
2 Méthodes pour installer containerD, soit en utilisant les repository docker, soit via les repo de la distribution
# Add Docker repo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" # Install containerd sudo apt update
Depuis les repos docker
sudo apt install -y containerd.io
Depuis les repos distribution
sudo apt install containerd
Create a containerd configuration file
sudo mkdir -p /etc/containerd sudo containerd config default | sudo tee /etc/containerd/config.toml
Set the cgroup driver for runc to systemd
Set the cgroup driver for runc to systemd which is required for the kubelet.
For more information on this config file see the containerd configuration docs here and also here.
At the end of this section in /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] ...
Around line 112, change the value for SystemCgroup from false to true.
SystemdCgroup = true
If you like, you can use sed to swap it out in the file with out having to manually edit the file.
sudo sed -i 's/ SystemdCgroup = false/ SystemdCgroup = true/' /etc/containerd/config.toml
Restart containerd with the new configuration
# restart containerd sudo systemctl restart containerd sudo systemctl enable containerd systemctl status containerd
And that’s it, from here you can install and configure Kubernetes on top of this container runtime. In an upcoming post, I will bootstrap a cluster using containerd as the container runtime.
Si les noeuds sont pas labelisés en tant que worker
kubectl label node worker001 node-role.kubernetes.io/worker=worker kubectl label node worker002 node-role.kubernetes.io/worker=worker
Pour supprimer le label worker sur le noeuds 2
kubectl label node worker002 node-role.kubernetes.io/worker-
RKE2 - HA Clusters
This section describes how to install a high availability (HA) RKE2 cluster. An HA RKE2 cluster consists of:
- A fixed registration address that is placed in front of server nodes to allow other nodes to register with the cluster
- An odd number (three recommended) of server nodes that will run etcd, the Kubernetes API, and other control plane services
- Zero or more agent nodes that are designated to run your apps and services
Agents register through the fixed registration address. However, when RKE2 launches the kubelet and it must connect to the Kubernetes api-server, it does so through the rke2 agent process, which acts as a client-side load balancer.
Setting up an HA cluster requires the following steps:
- Configure a fixed registration address
- Launch the first server node
- Join additional server nodes
- Join agent nodes
1. Configure the Fixed Registration Address
Server nodes beyond the first one and all agent nodes need a URL to register against. This can be the IP or hostname of any of the server nodes, but in many cases those may change over time as nodes are created and destroyed. Therefore, you should have a stable endpoint in front of the server nodes.
This endpoint can be set up using any number approaches, such as:
- A layer 4 (TCP) load balancer
- Round-robin DNS
- Virtual or elastic IP addresses
This endpoint can also be used for accessing the Kubernetes API. So you can, for example, modify your kubeconfig file to point to it instead of a specific node.
Note that the rke2 server process listens on port 9345 for new nodes to register. The Kubernetes API is served on port 6443, as normal. Configure your load balancer accordingly.
2. Launch the first server node
root@master001:~# curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="server" sh - [INFO] finding release for channel stable [INFO] using v1.22.7+rke2r1 as release [INFO] downloading checksums at https://github.com/rancher/rke2/releases/download/v1.22.7+rke2r1/sha256sum-amd64.txt [INFO] downloading tarball at https://github.com/rancher/rke2/releases/download/v1.22.7+rke2r1/rke2.linux-amd64.tar.gz [INFO] verifying tarball [INFO] unpacking tarball file to /usr/local
3. config
The first server node establishes the secret token that other server or agent nodes will register with when connecting to the cluster.
To specify your own pre-shared secret as the token, set the token argument on startup.
If you do not specify a pre-shared secret, RKE2 will generate one and place it at /var/lib/rancher/rke2/server/node-token.
To avoid certificate errors with the fixed registration address, you should launch the server with the tls-san parameter set. This option adds an additional hostname or IP as a Subject Alternative Name in the server's TLS cert, and it can be specified as a list if you would like to access via both the IP and the hostname.
Here is an example of what the RKE2 config file (at /etc/rancher/rke2/config.yaml) would look like if you are following this guide.
# mkdir -p /etc/rancher/rke2
Note The RKE2 config file needs to be created manually. You can do that by running touch /etc/rancher/rke2/config.yaml as a privileged user.
token: my-shared-secret tls-san: - my-kubernetes-domain.com - another-kubernetes-domain.com
mkdir -p /etc/rancher/rke2 cat << EOF > /etc/rancher/rke2/config.yaml write-kubeconfig-mode: "0644" tls-san: - "oowy.lan" # (db) Set the base name of etcd snapshots. Default: etcd-snapshot-<unix-timestamp> (default: "etcd-snapshot") etcd-snapshot-name: "etcd-snapshot" # (db) Snapshot interval time in cron spec. eg. every 6 hours '* */6 * * *' (default: "0 */12 * * *") etcd-snapshot-schedule-cron: "* */6 * * *" # (db) Number of snapshots to retain Default: 5 (default: 5) etcd-snapshot-retention: "5" # (db) Directory to save db snapshots. (Default location: ${data-dir}/db/snapshots) etcd-snapshot-dir: "${data-dir}/db/snapshots" # (networking) IPv4/IPv6 network CIDRs to use for pod IPs (default: 10.42.0.0/16) cluster-cidr: "10.42.0.0/16" # (networking) IPv4/IPv6 network CIDRs to use for service IPs (default: 10.43.0.0/16) service-cidr: "10.43.0.0/16" # (networking) Port range to reserve for services with NodePort visibility (default: "30000-32767") service-node-port-range: "30000-32767" # (networking) IPv4 Cluster IP for coredns service. Should be in your service-cidr range (default: 10.43.0.10) cluster-dns: "10.43.0.10" # (networking) Cluster Domain (default: "cluster.local") cluster-domain: "cluster.local" cni: - calico disable: - rke2-canal - rke2-kube-proxy EOF
On lance RHE2 server
root@master001:~# systemctl enable rke2-server.service Created symlink /etc/systemd/system/multi-user.target.wants/rke2-server.service → /usr/local/lib/systemd/system/rke2-server.service.
root@master001:~# systemctl start rke2-server.service
root@master001:~# journalctl -u rke2-server -f -- Logs begin at Thu 2022-02-24 10:18:19 UTC. -- Feb 25 13:53:24 master001 rke2[2878]: time="2022-02-25T13:53:24Z" level=info msg="Event(v1.ObjectReference{Kind:\"Addon\", Namespace:\"kube-system\", Name:\"rke2-multus\", UID:\"\", APIVersion:\"k3s.cattle.io/v1\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletingManifest' Deleting manifest at \"/var/lib/rancher/rke2/server/manifests/rke2-multus.yaml\"" Feb 25 13:53:24 master001 rke2[2878]: time="2022-02-25T13:53:24Z" level=info msg="Stopped tunnel to 127.0.0.1:9345" Feb 25 13:53:24 master001 rke2[2878]: time="2022-02-25T13:53:24Z" level=info msg="Connecting to proxy" url="wss://10.75.168.101:9345/v1-rke2/connect" Feb 25 13:53:24 master001 rke2[2878]: time="2022-02-25T13:53:24Z" level=info msg="Proxy done" err="context canceled" url="wss://127.0.0.1:9345/v1-rke2/connect" Feb 25 13:53:24 master001 rke2[2878]: time="2022-02-25T13:53:24Z" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF" Feb 25 13:53:24 master001 rke2[2878]: time="2022-02-25T13:53:24Z" level=info msg="Updating TLS secret for rke2-serving (count: 10): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-10.75.168.101:10.75.168.101 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-master001:master001 listener.cattle.io/cn-oowy.lan:oowy.lan listener.cattle.io/fingerprint:SHA1=70BC8C7B219CFEB57118179109D5CB7EAA0F0460]" Feb 25 13:53:25 master001 rke2[2878]: time="2022-02-25T13:53:25Z" level=info msg="Active TLS secret rke2-serving (ver=371) (count 10): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-10.75.168.101:10.75.168.101 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-master001:master001 listener.cattle.io/cn-oowy.lan:oowy.lan listener.cattle.io/fingerprint:SHA1=70BC8C7B219CFEB57118179109D5CB7EAA0F0460]" Feb 25 13:53:25 master001 rke2[2878]: time="2022-02-25T13:53:25Z" level=info msg="Handling backend connection request [master001]" Feb 25 13:53:25 master001 rke2[2878]: time="2022-02-25T13:53:25Z" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrack-tcp-timeout-established=0s --healthz-bind-address=127.0.0.1 --hostname-override=master001 --kubeconfig=/var/lib/rancher/rke2/agent/kubeproxy.kubeconfig --proxy-mode=iptables" Feb 25 13:53:26 master001 rke2[2878]: time="2022-02-25T13:53:26Z" level=info msg="Labels and annotations have been set successfully on node: master001"
On vérifie que le cluster réponds
/var/lib/rancher/rke2/bin/kubectl \ --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes
output
root@master001:~# /var/lib/rancher/rke2/bin/kubectl \ > --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes NAME STATUS ROLES AGE VERSION master001 Ready control-plane,etcd,master 9m35s v1.22.7+rke2r1
Pour les autres nœuds du cluster on va ajouter en plus les infos du master dans le fichier de config
server: https://my-kubernetes-domain.com:9345 token: my-shared-secret ***
Le token se trouve dans le master001 : /var/lib/rancher/rke2/server/token Exemple
server: https://master001.oowy.lan:9345 token: K10b93dee6011a468fe9ea43d59ae8064c12ccefbe34a6b446bfd07ff7902d6d88e::server:eb92b677c234026da5db07826b243ca4 write-kubeconfig-mode: "0644" tls-san: - "oowy.lan" # (db) Set the base name of etcd snapshots. Default: etcd-snapshot-<unix-timestamp> (default: "etcd-snapshot") etcd-snapshot-name: "etcd-snapshot" # (db) Snapshot interval time in cron spec. eg. every 6 hours '* */6 * * *' (default: "0 */12 * * *") etcd-snapshot-schedule-cron: "* */6 * * *" # (db) Number of snapshots to retain Default: 5 (default: 5) etcd-snapshot-retention: "5" # (db) Directory to save db snapshots. (Default location: ${data-dir}/db/snapshots) etcd-snapshot-dir: "${data-dir}/db/snapshots" # (networking) IPv4/IPv6 network CIDRs to use for pod IPs (default: 10.42.0.0/16) cluster-cidr: "10.42.0.0/16" # (networking) IPv4/IPv6 network CIDRs to use for service IPs (default: 10.43.0.0/16) service-cidr: "10.43.0.0/16" # (networking) Port range to reserve for services with NodePort visibility (default: "30000-32767") service-node-port-range: "30000-32767" # (networking) IPv4 Cluster IP for coredns service. Should be in your service-cidr range (default: 10.43.0.10) cluster-dns: "10.43.0.10" # (networking) Cluster Domain (default: "cluster.local") cluster-domain: "cluster.local" cni: - calico disable: - rke2-canal - rke2-kube-proxy
Ajout des nodes
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
root@worker001:~# curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh - [INFO] finding release for channel stable [INFO] using v1.22.7+rke2r1 as release [INFO] downloading checksums at https://github.com/rancher/rke2/releases/download/v1.22.7+rke2r1/sha256sum-amd64.txt [INFO] downloading tarball at https://github.com/rancher/rke2/releases/download/v1.22.7+rke2r1/rke2.linux-amd64.tar.gz [INFO] verifying tarball [INFO] unpacking tarball file to /usr/local
On créer un fichier de config
mkdir -p /etc/rancher/rke2/ nano /etc/rancher/rke2/config.yaml
On dépose uste le server et token
server: https://master001.oowy.lan:9345 token: K10b93dee6011a468fe9ea43d59ae8064c12ccefbe34a6b446bfd07ff7902d6d88e::server:eb92b677c234026da5db07826b243ca4 node-label: - ["node-role.kubernetes.io/worker=true"]
On lance
systemctl enable rke2-agent.service
systemctl start rke2-agent.service
On vérifie
root@worker001:~# journalctl -u rke2-agent -f -- Logs begin at Thu 2022-02-24 10:18:25 UTC. -- Feb 25 14:26:07 worker001 rke2[4626]: Flag --tls-cert-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Feb 25 14:26:07 worker001 rke2[4626]: Flag --tls-private-key-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Feb 25 14:26:14 worker001 rke2[4599]: time="2022-02-25T14:26:14Z" level=info msg="Failed to update node worker001: nodes \"worker001\" is forbidden: node \"worker001\" is not allowed to modify taints" Feb 25 14:26:14 worker001 rke2[4599]: time="2022-02-25T14:26:14Z" level=info msg="Failed to update node worker001: nodes \"worker001\" is forbidden: node \"worker001\" is not allowed to modify taints" Feb 25 14:26:14 worker001 rke2[4599]: time="2022-02-25T14:26:14Z" level=info msg="Failed to update node worker001: nodes \"worker001\" is forbidden: node \"worker001\" is not allowed to modify taints" Feb 25 14:26:14 worker001 rke2[4599]: time="2022-02-25T14:26:14Z" level=info msg="Failed to update node worker001: nodes \"worker001\" is forbidden: node \"worker001\" is not allowed to modify taints" Feb 25 14:26:14 worker001 rke2[4599]: time="2022-02-25T14:26:14Z" level=info msg="Failed to update node worker001: nodes \"worker001\" is forbidden: node \"worker001\" is not allowed to modify taints" Feb 25 14:26:14 worker001 rke2[4599]: time="2022-02-25T14:26:14Z" level=info msg="Failed to update node worker001: Operation cannot be fulfilled on nodes \"worker001\": the object has been modified; please apply your changes to the latest version and try again" Feb 25 14:26:14 worker001 rke2[4599]: time="2022-02-25T14:26:14Z" level=info msg="labels have been set successfully on node: worker001" Feb 25 14:26:14 worker001 systemd[1]: Started Rancher Kubernetes Engine v2 (agent).
On vérifie que le node apparait bien
root@master001:~# /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes NAME STATUS ROLES AGE VERSION master001 Ready control-plane,etcd,master 34m v1.22.7+rke2r1 master002 Ready control-plane,etcd,master 21m v1.22.7+rke2r1 master003 Ready control-plane,etcd,master 18m v1.22.7+rke2r1 worker001 Ready <none> 70s v1.22.7+rke2r1
Supprimer un noeuds
/usr/local/bin/rke2-uninstall.sh
kubectl delete nodes worker002
kubectl get secret -n kube-system
Si le secret n'est pas supprimé
kubectl delete secret <node> -n kube-system