Blame - doc/source/admin/maintenance.rst - atmosphere

blob: f1998c8267080b33cab05a47e630d117e05c870a [file] [log] [blame]

Mohammed Naser	90128aa	2024-04-29 13:21:58 -0400	[diff] [blame]	1	#################
				2	Maintenance Guide
				3	#################
				4
				5	This guide provides instructions for regular maintenance tasks necessary to
				6	ensure the smooth and secure operation of the system.
				7
Mohammed Naser	484bb98	2024-05-10 16:11:43 +0100	[diff] [blame]	8	********************************
				9	Evacuating Nodes for Maintenance
				10	********************************
				11
				12	When you need to perform maintenance on a node, you will need to evacuate the
				13	node to ensure that no workloads are running on it. Depending on the type of
				14	node you are evacuating, you will need to use different commands.
				15
				16	Control Plane Node
				17	==================
				18
				19	To evacuate a control plane node, you will need to drain the node. This will
				20	cause all the control plane components to be moved to other nodes in the
				21	cluster. To drain a control plane node, run the following command against
				22	the node you want to drain:
				23
				24	.. code-block:: console
				25
				26	$ kubectl drain <node-name> --ignore-daemonsets --delete-local-data <node-name>
				27
				28	In the example above, you would replace ``<node-name>`` with the name of the
				29	node you want to drain. Once this process is complete, you can safely perform
				30	maintenance on the node.
				31
				32	When you are done with the maintenance, you can uncordon the node by running
				33	the following command:
				34
				35	.. code-block:: console
				36
				37	$ kubectl uncordon <node-name>
				38
				39	Compute Node
				40	============
				41
				42	In order to evacuate a compute node, you will need to start by disabling the
				43	OpenStack compute service on the node. This will prevent new workloads from
				44	being scheduled on the node. To disable the OpenStack compute service, run
				45	the following command against the node you want to evacuate:
				46
				47	.. code-block:: console
				48
				49	$ openstack compute service set --disable <node-name> nova-compute
				50
				51	In the example above, you would replace ``<node-name>`` with the name of the
				52	node you want to evacuate. Once the OpenStack compute service has been
				53	disabled, you will need to evacuate all the virtual machines running on the
				54	node. To do this, run the following command:
				55
				56	.. code-block:: console
				57
				58	$ nova host-evacuate-live <node-name>
				59
				60	In the example above, you would replace ``<node-name>`` with the name of the
				61	node you want to evacuate. This command will live migrate all the virtual
				62	machines running on the node to other nodes in the cluster.
				63
				64	.. admonition:: Note
				65
				66	It is generally not recommended to use the ``nova`` client however the
				67	``nova host-evacuate-live`` command is not available in the ``openstack``
				68	client (see `bug 2055552 <https://bugs.launchpad.net/python-openstackclient/+bug/2055552>`_).
				69
				70	You can monitor the progress of this operation by seeing if there are any VMs
				71	left on the node by running the following command:
				72
				73	.. code-block:: console
				74
				75	$ openstack server list --host <node-name>
				76
				77	Once all the virtual machines have been evacuated, you can safely perform
				78	maintenance on the node. When you are done with the maintenance, you can
				79	reenable the OpenStack compute service by running the following command:
				80
				81	.. code-block:: console
				82
				83	$ openstack compute service set --enable <node-name> nova-compute
				84
				85	.. admonition:: Note
				86
				87	Once you enable the compute service, the node will start accepting new
				88	VMs but it will not automatically move the VMs back to the node. You will
				89	need to manually move the VMs back to the node if you want them to run
				90	there.
				91
Mohammed Naser	90128aa	2024-04-29 13:21:58 -0400	[diff] [blame]	92	*********************
				93	Renewing Certificates
				94	*********************
				95
				96	The certificates used by the Kubernetes cluster are valid for one year. They
				97	are automatically renewed when the cluster is upgraded to a new version of
				98	Kubernetes. However, if you are running the same version of Kubernetes for
				99	more than a year, you will need to manually renew the certificates.
				100
				101	To renew the certificates, run the following command on each one of your
				102	control plane nodes:
				103
				104	.. code-block:: console
				105
				106	$ sudo kubeadm certs renew all
				107
				108	Once the certificates have been renewed, you will need to restart the
				109	Kubernetes control plane components to pick up the new certificates. You need
				110	to do this on each one of your control plane nodes by running the following
				111	command one at a time on each node:
				112
				113	.. code-block:: console
				114
				115	$ ps auxf \| egrep '(kube-(apiserver\|controller-manager\|scheduler)\|etcd)' \| awk '{ print $2 }' \| xargs sudo kill