blob: f1998c8267080b33cab05a47e630d117e05c870a [file] [log] [blame]
Mohammed Naser90128aa2024-04-29 13:21:58 -04001#################
2Maintenance Guide
3#################
4
5This guide provides instructions for regular maintenance tasks necessary to
6ensure the smooth and secure operation of the system.
7
Mohammed Naser484bb982024-05-10 16:11:43 +01008********************************
9Evacuating Nodes for Maintenance
10********************************
11
12When you need to perform maintenance on a node, you will need to evacuate the
13node to ensure that no workloads are running on it. Depending on the type of
14node you are evacuating, you will need to use different commands.
15
16Control Plane Node
17==================
18
19To evacuate a control plane node, you will need to drain the node. This will
20cause all the control plane components to be moved to other nodes in the
21cluster. To drain a control plane node, run the following command against
22the node you want to drain:
23
24.. code-block:: console
25
26 $ kubectl drain <node-name> --ignore-daemonsets --delete-local-data <node-name>
27
28In the example above, you would replace ``<node-name>`` with the name of the
29node you want to drain. Once this process is complete, you can safely perform
30maintenance on the node.
31
32When you are done with the maintenance, you can uncordon the node by running
33the following command:
34
35.. code-block:: console
36
37 $ kubectl uncordon <node-name>
38
39Compute Node
40============
41
42In order to evacuate a compute node, you will need to start by disabling the
43OpenStack compute service on the node. This will prevent new workloads from
44being scheduled on the node. To disable the OpenStack compute service, run
45the following command against the node you want to evacuate:
46
47.. code-block:: console
48
49 $ openstack compute service set --disable <node-name> nova-compute
50
51In the example above, you would replace ``<node-name>`` with the name of the
52node you want to evacuate. Once the OpenStack compute service has been
53disabled, you will need to evacuate all the virtual machines running on the
54node. To do this, run the following command:
55
56.. code-block:: console
57
58 $ nova host-evacuate-live <node-name>
59
60In the example above, you would replace ``<node-name>`` with the name of the
61node you want to evacuate. This command will live migrate all the virtual
62machines running on the node to other nodes in the cluster.
63
64.. admonition:: Note
65
66 It is generally not recommended to use the ``nova`` client however the
67 ``nova host-evacuate-live`` command is not available in the ``openstack``
68 client (see `bug 2055552 <https://bugs.launchpad.net/python-openstackclient/+bug/2055552>`_).
69
70You can monitor the progress of this operation by seeing if there are any VMs
71left on the node by running the following command:
72
73.. code-block:: console
74
75 $ openstack server list --host <node-name>
76
77Once all the virtual machines have been evacuated, you can safely perform
78maintenance on the node. When you are done with the maintenance, you can
79reenable the OpenStack compute service by running the following command:
80
81.. code-block:: console
82
83 $ openstack compute service set --enable <node-name> nova-compute
84
85.. admonition:: Note
86
87 Once you enable the compute service, the node will start accepting new
88 VMs but it will not automatically move the VMs back to the node. You will
89 need to manually move the VMs back to the node if you want them to run
90 there.
91
Mohammed Naser90128aa2024-04-29 13:21:58 -040092*********************
93Renewing Certificates
94*********************
95
96The certificates used by the Kubernetes cluster are valid for one year. They
97are automatically renewed when the cluster is upgraded to a new version of
98Kubernetes. However, if you are running the same version of Kubernetes for
99more than a year, you will need to manually renew the certificates.
100
101To renew the certificates, run the following command on each one of your
102control plane nodes:
103
104.. code-block:: console
105
106 $ sudo kubeadm certs renew all
107
108Once the certificates have been renewed, you will need to restart the
109Kubernetes control plane components to pick up the new certificates. You need
110to do this on each one of your control plane nodes by running the following
111command one at a time on each node:
112
113.. code-block:: console
114
115 $ ps auxf | egrep '(kube-(apiserver|controller-manager|scheduler)|etcd)' | awk '{ print $2 }' | xargs sudo kill