Mohammed Naser | 90128aa | 2024-04-29 13:21:58 -0400 | [diff] [blame] | 1 | ################# |
| 2 | Maintenance Guide |
| 3 | ################# |
| 4 | |
| 5 | This guide provides instructions for regular maintenance tasks necessary to |
| 6 | ensure the smooth and secure operation of the system. |
| 7 | |
Mohammed Naser | 484bb98 | 2024-05-10 16:11:43 +0100 | [diff] [blame] | 8 | ******************************** |
| 9 | Evacuating Nodes for Maintenance |
| 10 | ******************************** |
| 11 | |
| 12 | When you need to perform maintenance on a node, you will need to evacuate the |
| 13 | node to ensure that no workloads are running on it. Depending on the type of |
| 14 | node you are evacuating, you will need to use different commands. |
| 15 | |
| 16 | Control Plane Node |
| 17 | ================== |
| 18 | |
| 19 | To evacuate a control plane node, you will need to drain the node. This will |
| 20 | cause all the control plane components to be moved to other nodes in the |
| 21 | cluster. To drain a control plane node, run the following command against |
| 22 | the node you want to drain: |
| 23 | |
| 24 | .. code-block:: console |
| 25 | |
| 26 | $ kubectl drain <node-name> --ignore-daemonsets --delete-local-data <node-name> |
| 27 | |
| 28 | In the example above, you would replace ``<node-name>`` with the name of the |
| 29 | node you want to drain. Once this process is complete, you can safely perform |
| 30 | maintenance on the node. |
| 31 | |
| 32 | When you are done with the maintenance, you can uncordon the node by running |
| 33 | the following command: |
| 34 | |
| 35 | .. code-block:: console |
| 36 | |
| 37 | $ kubectl uncordon <node-name> |
| 38 | |
| 39 | Compute Node |
| 40 | ============ |
| 41 | |
| 42 | In order to evacuate a compute node, you will need to start by disabling the |
| 43 | OpenStack compute service on the node. This will prevent new workloads from |
| 44 | being scheduled on the node. To disable the OpenStack compute service, run |
| 45 | the following command against the node you want to evacuate: |
| 46 | |
| 47 | .. code-block:: console |
| 48 | |
| 49 | $ openstack compute service set --disable <node-name> nova-compute |
| 50 | |
| 51 | In the example above, you would replace ``<node-name>`` with the name of the |
| 52 | node you want to evacuate. Once the OpenStack compute service has been |
| 53 | disabled, you will need to evacuate all the virtual machines running on the |
| 54 | node. To do this, run the following command: |
| 55 | |
| 56 | .. code-block:: console |
| 57 | |
| 58 | $ nova host-evacuate-live <node-name> |
| 59 | |
| 60 | In the example above, you would replace ``<node-name>`` with the name of the |
| 61 | node you want to evacuate. This command will live migrate all the virtual |
| 62 | machines running on the node to other nodes in the cluster. |
| 63 | |
| 64 | .. admonition:: Note |
| 65 | |
| 66 | It is generally not recommended to use the ``nova`` client however the |
| 67 | ``nova host-evacuate-live`` command is not available in the ``openstack`` |
| 68 | client (see `bug 2055552 <https://bugs.launchpad.net/python-openstackclient/+bug/2055552>`_). |
| 69 | |
| 70 | You can monitor the progress of this operation by seeing if there are any VMs |
| 71 | left on the node by running the following command: |
| 72 | |
| 73 | .. code-block:: console |
| 74 | |
| 75 | $ openstack server list --host <node-name> |
| 76 | |
| 77 | Once all the virtual machines have been evacuated, you can safely perform |
| 78 | maintenance on the node. When you are done with the maintenance, you can |
| 79 | reenable the OpenStack compute service by running the following command: |
| 80 | |
| 81 | .. code-block:: console |
| 82 | |
| 83 | $ openstack compute service set --enable <node-name> nova-compute |
| 84 | |
| 85 | .. admonition:: Note |
| 86 | |
| 87 | Once you enable the compute service, the node will start accepting new |
| 88 | VMs but it will not automatically move the VMs back to the node. You will |
| 89 | need to manually move the VMs back to the node if you want them to run |
| 90 | there. |
| 91 | |
Mohammed Naser | 90128aa | 2024-04-29 13:21:58 -0400 | [diff] [blame] | 92 | ********************* |
| 93 | Renewing Certificates |
| 94 | ********************* |
| 95 | |
| 96 | The certificates used by the Kubernetes cluster are valid for one year. They |
| 97 | are automatically renewed when the cluster is upgraded to a new version of |
| 98 | Kubernetes. However, if you are running the same version of Kubernetes for |
| 99 | more than a year, you will need to manually renew the certificates. |
| 100 | |
| 101 | To renew the certificates, run the following command on each one of your |
| 102 | control plane nodes: |
| 103 | |
| 104 | .. code-block:: console |
| 105 | |
| 106 | $ sudo kubeadm certs renew all |
| 107 | |
| 108 | Once the certificates have been renewed, you will need to restart the |
| 109 | Kubernetes control plane components to pick up the new certificates. You need |
| 110 | to do this on each one of your control plane nodes by running the following |
| 111 | command one at a time on each node: |
| 112 | |
| 113 | .. code-block:: console |
| 114 | |
| 115 | $ ps auxf | egrep '(kube-(apiserver|controller-manager|scheduler)|etcd)' | awk '{ print $2 }' | xargs sudo kill |