博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
kubeadm安装的Kubernetes etcd备份恢复
阅读量:5888 次
发布时间:2019-06-19

本文共 13616 字,大约阅读时间需要 45 分钟。

kubeadm安装的Kubernetes etcd备份恢复

[TOC]

1. 事件由来

2018年9月16日台风过后,我的一套kuernetes测试系统,etcd启动失败,经过半天的抢救,仍然无果(3台master都是如下错误)。无奈再花半天时间把环境重新弄了起来。即使是etcd集群,备份也是必须的,因为数据没了,就都没了。好在问题出现得早,要是正式生产出现这种情况,估计要卷铺盖走人了。因此,研究下kubernetes备份。

2018-09-17 00:11:55.781279 I | etcdmain: etcd Version: 3.2.182018-09-17 00:11:55.781457 I | etcdmain: Git SHA: eddf599c62018-09-17 00:11:55.781477 I | etcdmain: Go Version: go1.8.72018-09-17 00:11:55.781503 I | etcdmain: Go OS/Arch: linux/amd642018-09-17 00:11:55.781519 I | etcdmain: setting maximum number of CPUs to 32, total number of available CPUs is 322018-09-17 00:11:55.781634 N | etcdmain: the server is already initialized as member before, starting as etcd member...2018-09-17 00:11:55.781702 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true2018-09-17 00:11:55.783073 I | embed: listening for peers on https://192.168.105.92:23802018-09-17 00:11:55.783182 I | embed: listening for client requests on 127.0.0.1:23792018-09-17 00:11:55.783281 I | embed: listening for client requests on 192.168.105.92:23792018-09-17 00:11:55.791474 I | etcdserver: recovered store from snapshot at index 164716962018-09-17 00:11:55.792633 I | mvcc: restore compact to 136833662018-09-17 00:11:55.849153 C | mvcc: store.keyindex: put with unexpected smaller revision [{13685569 0} / {13685569 0}]panic: store.keyindex: put with unexpected smaller revision [{13685569 0} / {13685569 0}]goroutine 89 [running]:github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc42018c160, 0xfa564e, 0x3e, 0xc420062cb0, 0x2, 0x2)/tmp/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x15cgithub.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.(*keyIndex).put(0xc4207fd7c0, 0xd0d341, 0x0)/tmp/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/key_index.go:80 +0x3ecgithub.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.restoreIntoIndex.func1(0xc42029e460, 0xc4202a0600, 0x14bef40, 0xc420285640)/tmp/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/kvstore.go:367 +0x3e3created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.restoreIntoIndex/tmp/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/kvstore.go:374 +0xa5

2. 环境说明

kubeadm安装的kubernetes1.11

3. etcd集群查看

# 列出成员etcdctl --endpoints=https://192.168.105.92:2379,https://192.168.105.93:2379,https://192.168.105.94:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt  --key-file=/etc/kubernetes/pki/etcd/server.key --ca-file=/etc/kubernetes/pki/etcd/ca.crt member list# 列出kubernetes数据export ETCDCTL_API=3etcdctl get / --prefix --keys-only --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt

4. etcd数据备份

  • 备份 /etc/kubernetes/ 目录下的所有文件(证书,manifest文件)
  • /var/lib/kubelet/ 目录下所有文件(plugins容器连接认证)
  • etcd V3版api数据

将脚本添加到计划任务,每日备份。

#!/usr/bin/env bash############################################################### File Name: ut_backup_k8s.sh# Version: V1.0# Author: Chinge_Yang# Blog: http://blog.csdn.net/ygqygq2# Created Time : 2018-09-18 09:13:55# Description:###############################################################获取脚本所存放目录cd `dirname $0`bash_path=`pwd`#脚本名me=$(basename $0)# delete dir and keep daysdelete_dirs=("/data/backup/kubernetes:7")backup_dir=/data/backup/kubernetesfiles_dir=("/etc/kubernetes" "/var/lib/kubelet")log_dir=$backup_dir/logshell_log=$log_dir/${USER}_${me}.logssh_port="22"ssh_parameters="-o StrictHostKeyChecking=no -o ConnectTimeout=60"ssh_command="ssh ${ssh_parameters} -p ${ssh_port}"scp_command="scp ${ssh_parameters} -P ${ssh_port}"DATE=$(date +%F)BACK_SERVER="127.0.0.1"  # 远程备份服务器IPBACK_SERVER_BASE_DIR="/data/backup"BACK_SERVER_DIR="$BACK_SERVER_BASE_DIR/kubernetes/${HOSTNAME}"  # 远程备份服务器目录BACK_SERVER_LOG_DIR="$BACK_SERVER_BASE_DIR/kubernetes/logs"#定义保存日志函数function save_log () {    echo -e "`date +%F\ %T` $*" >> $shell_log}save_log "start backup mysql"[ ! -d $log_dir ] && mkdir -p $log_dir#定义输出颜色函数function red_echo () {#用法:  red_echo "内容"    local what=$*    echo -e "\e[1;31m ${what} \e[0m"}function green_echo () {#用法:  green_echo "内容"    local what=$*    echo -e "\e[1;32m ${what} \e[0m"}function yellow_echo () {#用法:  yellow_echo "内容"    local what=$*    echo -e "\e[1;33m ${what} \e[0m"}function twinkle_echo () {#用法:  twinkle_echo $(red_echo "内容")  ,此处例子为红色闪烁输出    local twinkle='\e[05m'    local what="${twinkle} $*"    echo -e "${what}"}function return_echo () {    [ $? -eq 0 ] && green_echo "$* 成功" || red_echo "$* 失败" }function return_error_exit () {    [ $? -eq 0 ] && REVAL="0"    local what=$*    if [ "$REVAL" = "0" ];then        [ ! -z "$what" ] && green_echo "$what 成功"    else        red_echo "$* 失败,脚本退出"        exit 1    fi}#定义确认函数function user_verify_function () {    while true;do        echo ""        read -p "是否确认?[Y/N]:" Y        case $Y in    [yY]|[yY][eE][sS])        echo -e "answer:  \\033[20G [ \e[1;32m是\e[0m ] \033[0m"        break           ;;    [nN]|[nN][oO])        echo -e "answer:  \\033[20G [ \e[1;32m否\e[0m ] \033[0m"                  exit 1        ;;      *)        continue        ;;        esac    done}#定义跳过函数function user_pass_function () {    while true;do        echo ""        read -p "是否确认?[Y/N]:" Y        case $Y in            [yY]|[yY][eE][sS])            echo -e "answer:  \\033[20G [ \e[1;32m是\e[0m ] \033[0m"            break               ;;            [nN]|[nN][oO])            echo -e "answer:  \\033[20G [ \e[1;32m否\e[0m ] \033[0m"                      return 1            ;;            *)            continue            ;;            esac    done}function backup () {    for f_d in ${files_dir[@]}; do        f_name=$(basename ${f_d})        d_name=$(dirname $f_d)        cd $d_name        tar -cjf ${f_name}.tar.bz $f_name        if [ $? -eq 0 ]; then            file_size=$(du ${f_name}.tar.bz|awk '{print $1}')            save_log "$file_size ${f_name}.tar.bz"            save_log "finish tar ${f_name}.tar.bz"        else            file_size=0            save_log "failed tar ${f_name}.tar.bz"        fi        rsync -avzP ${f_name}.tar.bz  $backup_dir/$(date +%F)-${f_name}.tar.bz        rm -f ${f_name}.tar.bz    done    export ETCDCTL_API=3    etcdctl --cert=/etc/kubernetes/pki/etcd/server.crt \        --key=/etc/kubernetes/pki/etcd/server.key \        --cacert=/etc/kubernetes/pki/etcd/ca.crt \        snapshot save $backup_dir/$(date +%F)-k8s-snapshot.db    cd $backup_dir    tar -cjf $(date +%F)-k8s-snapshot.tar.bz $(date +%F)-k8s-snapshot.db     if [ $? -eq 0 ]; then        file_size=$(du $(date +%F)-k8s-snapshot.tar.bz|awk '{print $1}')        save_log "$file_size ${f_name}.tar.bz"        save_log "finish tar ${f_name}.tar.bz"    else        file_size=0        save_log "failed tar ${f_name}.tar.bz"    fi    rm -f $(date +%F)-k8s-snapshot.db}function rsync_backup_files () {    # 传输日志文件    #传输到远程服务器备份, 需要配置免密ssh认证    $ssh_command root@${BACK_SERVER} "mkdir -p ${BACK_SERVER_DIR}/${DATE}/"    rsync -avz --bwlimit=5000 -e "${ssh_command}" $backup_dir/*.bz \    root@${BACK_SERVER}:${BACK_SERVER_DIR}/${DATE}/    [ $? -eq 0 ] && save_log "success rsync" || \      save_log "failed rsync"}function delete_old_files () {    for delete_dir_keep_days in ${delete_dirs[@]}; do        delete_dir=$(echo $delete_dir_keep_days|awk -F':' '{print $1}')        keep_days=$(echo $delete_dir_keep_days|awk -F':' '{print $2}')        [ -n "$delete_dir" ] && cd ${delete_dir}        [ $? -eq 0 ] && find -L ${delete_dir} -mindepth 1 -mtime +$keep_days -exec rm -rf {} \;    done}backupdelete_old_files#rsync_backup_filessave_log "finish $0\n"exit 0

5. etcd数据恢复

注意

数据恢复操作,会停止全部应用状态和访问!!!

首先需要分别停掉三台Master机器的kube-apiserver,确保kube-apiserver已经停止了。

mv /etc/kubernetes/manifests /etc/kubernetes/manifests.bakdocker ps|grep k8s_  # 查看etcd、api是否up,等待全部停止mv /var/lib/etcd /var/lib/etcd.bak

etcd集群用同一份snapshot恢复。

# 准备恢复文件cd /tmptar -jxvf /data/backup/kubernetes/2018-09-18-k8s-snapshot.tar.bzrsync -avz 2018-09-18-k8s-snapshot.db 192.168.105.93:/tmp/rsync -avz 2018-09-18-k8s-snapshot.db 192.168.105.94:/tmp/

在lab1上执行:

cd /tmp/export ETCDCTL_API=3etcdctl snapshot restore 2018-09-18-k8s-snapshot.db \    --endpoints=192.168.105.92:2379 \    --name=lab1 \    --cert=/etc/kubernetes/pki/etcd/server.crt \    --key=/etc/kubernetes/pki/etcd/server.key \    --cacert=/etc/kubernetes/pki/etcd/ca.crt \    --initial-advertise-peer-urls=https://192.168.105.92:2380 \    --initial-cluster-token=etcd-cluster-0 \    --initial-cluster=lab1=https://192.168.105.92:2380,lab2=https://192.168.105.93:2380,lab3=https://192.168.105.94:2380 \    --data-dir=/var/lib/etcd

在lab2上执行:

cd /tmp/export ETCDCTL_API=3etcdctl snapshot restore 2018-09-18-k8s-snapshot.db \    --endpoints=192.168.105.93:2379 \    --name=lab2 \    --cert=/etc/kubernetes/pki/etcd/server.crt \    --key=/etc/kubernetes/pki/etcd/server.key \    --cacert=/etc/kubernetes/pki/etcd/ca.crt \    --initial-advertise-peer-urls=https://192.168.105.93:2380 \    --initial-cluster-token=etcd-cluster-0 \    --initial-cluster=lab1=https://192.168.105.92:2380,lab2=https://192.168.105.93:2380,lab3=https://192.168.105.94:2380 \    --data-dir=/var/lib/etcd

在lab3上执行:

cd /tmp/export ETCDCTL_API=3etcdctl snapshot restore 2018-09-18-k8s-snapshot.db \    --endpoints=192.168.105.94:2379 \    --name=lab3 \    --cert=/etc/kubernetes/pki/etcd/server.crt \    --key=/etc/kubernetes/pki/etcd/server.key \    --cacert=/etc/kubernetes/pki/etcd/ca.crt \    --initial-advertise-peer-urls=https://192.168.105.94:2380 \    --initial-cluster-token=etcd-cluster-0 \    --initial-cluster=lab1=https://192.168.105.92:2380,lab2=https://192.168.105.93:2380,lab3=https://192.168.105.94:2380 \    --data-dir=/var/lib/etcd

全部恢复完成后,三台Master机器恢复manifests。

mv /etc/kubernetes/manifests.bak /etc/kubernetes/manifests

最后确认:

# 再次查看key[root@lab1 kubernetes]# etcdctl get / --prefix --keys-only --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crtregistry/apiextensions.k8s.io/customresourcedefinitions/apprepositories.kubeapps.com/registry/apiregistration.k8s.io/apiservices/v1./registry/apiregistration.k8s.io/apiservices/v1.apps/registry/apiregistration.k8s.io/apiservices/v1.authentication.k8s.io           ........此处省略..........[root@lab1 kubernetes]# kubectl get pod -n kube-systemNAME                                              READY     STATUS    RESTARTS   AGEcoredns-777d78ff6f-m5chm                          1/1       Running   1          18hcoredns-777d78ff6f-xm7q8                          1/1       Running   1          18hdashboard-kubernetes-dashboard-7cfc6c7bf5-hr96q   1/1       Running   0          13hdashboard-kubernetes-dashboard-7cfc6c7bf5-x9p7j   1/1       Running   0          13hetcd-lab1                                         1/1       Running   0          18hetcd-lab2                                         1/1       Running   0          1metcd-lab3                                         1/1       Running   0          18hkube-apiserver-lab1                               1/1       Running   0          18hkube-apiserver-lab2                               1/1       Running   0          1mkube-apiserver-lab3                               1/1       Running   0          18hkube-controller-manager-lab1                      1/1       Running   0          18hkube-controller-manager-lab2                      1/1       Running   0          1mkube-controller-manager-lab3                      1/1       Running   0          18hkube-flannel-ds-7w6rl                             1/1       Running   2          18hkube-flannel-ds-b9pkf                             1/1       Running   2          18hkube-flannel-ds-fck8t                             1/1       Running   1          18hkube-flannel-ds-kklxs                             1/1       Running   1          18hkube-flannel-ds-lxxx9                             1/1       Running   2          18hkube-flannel-ds-q7lpg                             1/1       Running   1          18hkube-flannel-ds-tlqqn                             1/1       Running   1          18hkube-proxy-85j7g                                  1/1       Running   1          18hkube-proxy-gdvkk                                  1/1       Running   1          18hkube-proxy-jw5gh                                  1/1       Running   1          18hkube-proxy-pgfxf                                  1/1       Running   1          18hkube-proxy-qx62g                                  1/1       Running   1          18hkube-proxy-rlbdb                                  1/1       Running   1          18hkube-proxy-whhcv                                  1/1       Running   1          18hkube-scheduler-lab1                               1/1       Running   0          18hkube-scheduler-lab2                               1/1       Running   0          1mkube-scheduler-lab3                               1/1       Running   0          18hkubernetes-dashboard-754f4d5f69-7npk5             1/1       Running   0          13hkubernetes-dashboard-754f4d5f69-whtg9             1/1       Running   0          13htiller-deploy-98f7f7564-59hcs                     1/1       Running   0          13h

进相应的安装程序确认,数据全部正常。

6. 小结

不管是二进制还是kubeadm安装的Kubernetes,其备份主要是通过etcd的备份完成的。而恢复时,主要考虑的是整个顺序:停止kube-apiserver,停止etcd,恢复数据,启动etcd,启动kube-apiserver。

参考资料:

[1]

转载于:https://blog.51cto.com/ygqygq2/2176492

你可能感兴趣的文章
go微服务框架go-micro深度学习(三) Registry服务的注册和发现
查看>>
expectFAQ(附一个python批量任务脚本)--持续更新
查看>>
HDU 2492 Ping pong
查看>>
JPA的Embeddable注解
查看>>
Maven在Eclipse中的实用小技巧
查看>>
步步为营Hibernate全攻略(一)构建Hibernate框架环境
查看>>
【开放源代码】【谐波数据生成器】【上位机软件】(版本:0.00)
查看>>
Hibernate基础-HelloWord
查看>>
Android Studio系列教程四--Gradle基础
查看>>
添加cordova-plugin-file-opener2后,打包出错
查看>>
python 重载方法有哪些特点 - 老王python - 博客园
查看>>
在Fedora8上安装MySQL5.0.45的过程
查看>>
TCP长连接与短连接的区别
查看>>
设计模式之命令模式
查看>>
android 测试 mondey
查看>>
Spring AOP项目应用——方法入参校验 & 日志横切
查看>>
TestNG 六 测试结果
查看>>
用Fiddler或Charles进行mock数据搭建测试环境
查看>>
使用REST-Assured对API接口进行自动化测试
查看>>
GitHub发布史上最大更新,年度报告出炉!
查看>>