blob: f1998c8267080b33cab05a47e630d117e05c870a [file] [log] [blame] [edit]
#################
Maintenance Guide
#################
This guide provides instructions for regular maintenance tasks necessary to
ensure the smooth and secure operation of the system.
********************************
Evacuating Nodes for Maintenance
********************************
When you need to perform maintenance on a node, you will need to evacuate the
node to ensure that no workloads are running on it. Depending on the type of
node you are evacuating, you will need to use different commands.
Control Plane Node
==================
To evacuate a control plane node, you will need to drain the node. This will
cause all the control plane components to be moved to other nodes in the
cluster. To drain a control plane node, run the following command against
the node you want to drain:
.. code-block:: console
$ kubectl drain <node-name> --ignore-daemonsets --delete-local-data <node-name>
In the example above, you would replace ``<node-name>`` with the name of the
node you want to drain. Once this process is complete, you can safely perform
maintenance on the node.
When you are done with the maintenance, you can uncordon the node by running
the following command:
.. code-block:: console
$ kubectl uncordon <node-name>
Compute Node
============
In order to evacuate a compute node, you will need to start by disabling the
OpenStack compute service on the node. This will prevent new workloads from
being scheduled on the node. To disable the OpenStack compute service, run
the following command against the node you want to evacuate:
.. code-block:: console
$ openstack compute service set --disable <node-name> nova-compute
In the example above, you would replace ``<node-name>`` with the name of the
node you want to evacuate. Once the OpenStack compute service has been
disabled, you will need to evacuate all the virtual machines running on the
node. To do this, run the following command:
.. code-block:: console
$ nova host-evacuate-live <node-name>
In the example above, you would replace ``<node-name>`` with the name of the
node you want to evacuate. This command will live migrate all the virtual
machines running on the node to other nodes in the cluster.
.. admonition:: Note
It is generally not recommended to use the ``nova`` client however the
``nova host-evacuate-live`` command is not available in the ``openstack``
client (see `bug 2055552 <https://bugs.launchpad.net/python-openstackclient/+bug/2055552>`_).
You can monitor the progress of this operation by seeing if there are any VMs
left on the node by running the following command:
.. code-block:: console
$ openstack server list --host <node-name>
Once all the virtual machines have been evacuated, you can safely perform
maintenance on the node. When you are done with the maintenance, you can
reenable the OpenStack compute service by running the following command:
.. code-block:: console
$ openstack compute service set --enable <node-name> nova-compute
.. admonition:: Note
Once you enable the compute service, the node will start accepting new
VMs but it will not automatically move the VMs back to the node. You will
need to manually move the VMs back to the node if you want them to run
there.
*********************
Renewing Certificates
*********************
The certificates used by the Kubernetes cluster are valid for one year. They
are automatically renewed when the cluster is upgraded to a new version of
Kubernetes. However, if you are running the same version of Kubernetes for
more than a year, you will need to manually renew the certificates.
To renew the certificates, run the following command on each one of your
control plane nodes:
.. code-block:: console
$ sudo kubeadm certs renew all
Once the certificates have been renewed, you will need to restart the
Kubernetes control plane components to pick up the new certificates. You need
to do this on each one of your control plane nodes by running the following
command one at a time on each node:
.. code-block:: console
$ ps auxf | egrep '(kube-(apiserver|controller-manager|scheduler)|etcd)' | awk '{ print $2 }' | xargs sudo kill