{"id":133,"date":"2022-12-26T00:46:06","date_gmt":"2022-12-26T00:46:06","guid":{"rendered":"https:\/\/devopsopen.com\/?p=133"},"modified":"2023-01-01T13:39:30","modified_gmt":"2023-01-01T13:39:30","slug":"cluster-maintenance","status":"publish","type":"post","link":"https:\/\/devopsopen.com\/index.php\/2022\/12\/26\/cluster-maintenance\/","title":{"rendered":"Cluster Maintenance"},"content":{"rendered":"<h1>Cluster Maintenance<\/h1>\n<h4 id=\"Summary\">Summary<\/h4>\n<ul class=\"ez-toc-page-1 ez-toc-heading-level-2\">\n<li><a title=\"Os Upgrades\" href=\"#Os Upgrades\"> Os Upgrades<\/a><\/li>\n<li><a title=\"Kub Versions and Working with ETCDCTL\" href=\"#Kub Versions and Working with ETCDCTL\">Kub Versions and Working with ETCDCTL<\/a><\/li>\n<li><a title=\"Backup and Restore\" href=\"#Backup and Restore\"> Backup and Restore<\/a><\/li>\n<\/ul>\n<h2 id=\"Os Upgrades\"><a title=\"Summary\" href=\"#Summary\">Os Upgrades<\/a><\/h2>\n<p>Attribut to know : kube-controller-manager --pod -eviction-timeout=5m0s : when a node is timeouted and returned, all pods are destroyed and the node is clean. You can also lower the tiemout value for drai a node<\/p>\n<p>kubectl cordon node-1<br \/>\nKubernetes cordon is an operation that marks or taints a node in your existing node pool as unschedulable. By using it on a node, you can be sure that no new pods will be scheduled for this node. The command prevents the Kubernetes scheduler from placing new pods onto that node, but it doesn\u2019t affect existing pods on that node<\/p>\n<p>To empty the node from the remaining pods, or with other words migrate pods from a node to others for maintenance, you can use the drain command. the node will be unschedulable until you remove the restriction (drain)<br \/>\n'kubectl drain node-1' or 'kubectl drain node-12 --grace-period 0' to drain quickly waiting or 'kubectl deain node-12 --force' with forcing<\/p>\n<p>if you want to reschedule a node, use the command uncordon<br \/>\nkubectl uncordon node-1<\/p>\n<p>You should to know that drain contain cordon command by default, and if you want to reschedule the node just uncordon command wil be needed<br \/>\nif the node contain a signle pod without replicaset or daemontset... cannot be drained, but you can force the drain with --force attribut but you wil lost it<\/p>\n<h2 id=\"Kub Versions and Working with ETCDCTL\"><a title=\"Summary\" href=\"#Summary\">Kub Versions and Working with ETCDCTL<\/a><\/h2>\n<p>for maintenance versions are very important, you can see links bellow to have more informations :<\/p>\n<blockquote>\n<pre>https:\/\/kubernetes.io\/docs\/concepts\/overview\/kubernetes-api\/\r\nHere is a link to kubernetes documentation if you want to learn more about this topic (You don't need it for the exam though):\r\nhttps:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/devel\/sig-architecture\/api-conventions.md\r\nhttps:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/devel\/sig-architecture\/api_changes.md<\/pre>\n<\/blockquote>\n<p>No component must have a superiro version than Kube API server, because component into control plane don't communicate each other but they delgate calls to api server.<br \/>\nmore than two minor version not accepted, you should upgrade Kub component. The method recomanded is to upgrade minor version one after one, for example if you have v1.10 and you want upgrade system to 1.13, you should upgrade to 1.11 then 1.12 and eventually 1.13<\/p>\n<p>If you use kubeadm, you can apply commands :<\/p>\n<blockquote>\n<pre>kubeadm upgrade plan (to get informations)\r\nkubeadm upgrade apply<\/pre>\n<\/blockquote>\n<p>Upgrades, need to be begin by master(controlplane), the applications will not be impacted because are into workers, Then you can upgrade workers : there are 3 strategies :<\/p>\n<blockquote><p>- evicte all workers and upgrad them, unavailibilty time application must be accepted<br \/>\n- upgrade one by one avaibility is garantee<br \/>\n- add new nodes upgraded and migrate pods from the old one to the newone. this strategy is used in cloud (need more VMs but secure and garantee avaibility application)<\/p><\/blockquote>\n<p>Upgrade steps :<\/p>\n<p>Upgrade masters first. Sometimes kubadm upgrade plan can give you command to upgrade directly to 1.13 if kube version is 1.11 for example, but you should upgrade minor version in order and not directly to 1.13, so you must upgrade to 1.12 before.<\/p>\n<p>Important the kubadm upgrade command don't upgrade kubeadm itself and kublet so you should upgrade them before :<\/p>\n<blockquote>\n<pre>apt-get upgrade\u00a0 -y\u00a0 kubeadm=1.12.0-00\r\nkubeadm upgrade apply v1.12.0\r\napt-get upgrade\u00a0 -y\u00a0 kubelet=1.12.0-00\r\nsystemctrl restart kubelet<\/pre>\n<\/blockquote>\n<p>do the samething to worker, the last command is kubectl uncordon node-1<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-148\" src=\"https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/upgradekube.png\" alt=\"\" width=\"666\" height=\"373\" srcset=\"https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/upgradekube.png 1019w, https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/upgradekube-300x168.png 300w, https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/upgradekube-768x430.png 768w\" sizes=\"(max-width: 666px) 100vw, 666px\" \/><\/p>\n<p>use command to watch the status and evolution of installation of nodes<\/p>\n<p>watch kubectl get nodes<\/p>\n<p>&nbsp;<\/p>\n<h2 id=\"Backup and Restore\"><a title=\"Summary\" href=\"#Summary\">Backup and Restore<\/a><\/h2>\n<p>- Create a backup for all ressources with command :<\/p>\n<blockquote>\n<pre>kubectl get all --all-namespaces -o yaml &gt; all-deploy-services.yaml<\/pre>\n<\/blockquote>\n<p>then there is utils to restore them like VELERO (ARK by HeptIO)<\/p>\n<p>- Backup ETCD : with etcdctl snapshot save command. You will have to make use of additional flags to connect to the ETCD server. values of options can be retreaved from describe pod of etcd<\/p>\n<blockquote>\n<pre>ETCDCTL_API=3 etcdctl --endpoints=https:\/\/[127.0.0.1]:2379 \\\r\n--cacert=\/etc\/kubernetes\/pki\/etcd\/ca.crt \\\r\n--cert=\/etc\/kubernetes\/pki\/etcd\/server.crt \\\r\n--key=\/etc\/kubernetes\/pki\/etcd\/server.key \\\r\nsnapshot save \/opt\/snapshot-pre-boot.db<\/pre>\n<\/blockquote>\n<p>--endpoints: Optional Flag, points to the address where ETCD is running (127.0.0.1:2379)<br \/>\n--cacert: Mandatory Flag (Absolute Path to the CA certificate file)<br \/>\n--cert: Mandatory Flag (Absolute Path to the Server certificate file)<br \/>\n--key: Mandatory Flag (Absolute Path to the Key file)<\/p>\n<p>- Check status of the backup<\/p>\n<blockquote>\n<pre>ETCDCTRL_API=3 etcdctrl snapshot status snapshot.db<\/pre>\n<\/blockquote>\n<p>- Restore ETCD :<\/p>\n<blockquote>\n<pre>Stop apiserver to stop request if installation is with services : service kube-apiserver stop\r\n\u00a0Restore ETCD Snapshot to a new folder\r\nETCDCTL_API=3 etcdctl --endpoints=https:\/\/[127.0.0.1]:2379 --cacert=\/etc\/kubernetes\/pki\/etcd\/ca.crt \\\r\n--name=master \\\r\n--cert=\/etc\/kubernetes\/pki\/etcd\/server.crt --key=\/etc\/kubernetes\/pki\/etcd\/server.key \\\r\n--data-dir \/var\/lib\/etcd-from-backup \\\r\n--initial-cluster=master=https:\/\/127.0.0.1:2380 \\\r\n--initial-cluster-token etcd-cluster-1 \\\r\n--initial-advertise-peer-urls=https:\/\/127.0.0.1:2380 \\\r\nsnapshot restore \/opt\/snapshot-pre-boot.db\r\n\r\n\u00a0Modify \/etc\/kubernetes\/manifests\/etcd.yaml\r\nUpdate --data-dir to use new target location\r\n--data-dir=\/var\/lib\/etcd-from-backup\r\n\r\nUpdate new initial-cluster-token to specify new cluster\r\n--initial-cluster-token=etcd-cluster-1\r\n\r\nUpdate volumes and volume mounts to point to new path\r\nvolumeMounts:\r\n- mountPath: \/var\/lib\/etcd-from-backup\r\nname: etcd-data\r\n- mountPath: \/etc\/kubernetes\/pki\/etcd\r\nname: etcd-certs\r\nhostNetwork: true\r\npriorityClassName: system-cluster-critical\r\nvolumes:\r\n- hostPath:\r\npath: \/var\/lib\/etcd-from-backup\r\ntype: DirectoryOrCreate\r\nname: etcd-data\r\n- hostPath:\r\npath: \/etc\/kubernetes\/pki\/etcd\r\ntype: DirectoryOrCreate\r\nname: etcd-certs\r\nIMPORTANT :<\/pre>\n<\/blockquote>\n<p>Note 1: As the ETCD pod has changed it will automatically restart, and also <code>kube-controller-manager<\/code> and <code>kube-scheduler<\/code>. Wait 1-2 to mins for this pods to restart. You can run the command: <code>watch \"crictl ps | grep etcd\"<\/code> to see when the ETCD pod is restarted.<\/p>\n<p>Note 2: If the etcd pod is not getting <code>Ready 1\/1<\/code>, then restart it by <code>kubectl delete pod -n kube-system etcd-controlplane<\/code> and wait 1 minute.<\/p>\n<p>Note 3: This is the simplest way to make sure that ETCD uses the restored data after the ETCD pod is recreated. You <b>don't<\/b> have to change anything else<\/p>\n<p>if installation is a service, then :<\/p>\n<blockquote>\n<pre>systemctrl daemon-reload\r\nservice etcd-restart\r\nservice kube-apiserver start<\/pre>\n<\/blockquote>\n<p>&nbsp;<\/p>\n<p>to know wich version of etcd is installed, check the image of the pod<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-150\" src=\"https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/backupetcd.png\" alt=\"\" width=\"457\" height=\"268\" srcset=\"https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/backupetcd.png 955w, https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/backupetcd-300x176.png 300w, https:\/\/devopsopen.com\/wp-content\/uploads\/2022\/12\/backupetcd-768x451.png 768w\" sizes=\"(max-width: 457px) 100vw, 457px\" \/><\/p>\n<p>Ref:<\/p>\n<p>https:\/\/kubernetes.io\/docs\/tasks\/administer-cluster\/configure-upgrade-etcd\/#backing-up-an-etcd-cluster<\/p>\n<p>https:\/\/github.com\/etcd-io\/website\/blob\/main\/content\/en\/docs\/v3.5\/op-guide\/recovery.md<\/p>\n<p><iframe loading=\"lazy\" title=\"Disaster Recovery for your Kubernetes Clusters [I] - Andy Goldstein &amp; Steve Kriss, Heptio\" width=\"1290\" height=\"726\" src=\"https:\/\/www.youtube.com\/embed\/qRPNuT080Hk?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cluster Maintenance Summary Os Upgrades Kub Versions and Working with ETCDCTL Backup and Restore Os Upgrades Attribut to know : kube-controller-manager &#8211;pod -eviction-timeout=5m0s : when a node is timeouted and returned, all pods are destroyed and the node is clean. You can also lower the tiemout value for drai a node kubectl cordon node-1 Kubernetes cordon is an operation that marks or taints a node in your existing node pool as unschedulable. By using it on a node, you can be sure that no new pods will be scheduled for this node. The command prevents the Kubernetes scheduler from placing\u2026<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":""},"categories":[12],"tags":[],"blocksy_meta":{"styles_descriptor":{"styles":{"desktop":"","tablet":"","mobile":""},"google_fonts":[],"version":5}},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"admin","author_link":"https:\/\/devopsopen.com\/index.php\/author\/admin_bak\/"},"uagb_comment_info":64,"uagb_excerpt":"Cluster Maintenance Summary Os Upgrades Kub Versions and Working with ETCDCTL Backup and Restore Os Upgrades Attribut to know : kube-controller-manager --pod -eviction-timeout=5m0s : when a node is timeouted and returned, all pods are destroyed and the node is clean. You can also lower the tiemout value for drai a node kubectl cordon node-1 Kubernetes&hellip;","_links":{"self":[{"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/posts\/133"}],"collection":[{"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/comments?post=133"}],"version-history":[{"count":18,"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/posts\/133\/revisions"}],"predecessor-version":[{"id":154,"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/posts\/133\/revisions\/154"}],"wp:attachment":[{"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/media?parent=133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/categories?post=133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devopsopen.com\/index.php\/wp-json\/wp\/v2\/tags?post=133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}