kubernetes 集群安装(kubeadm)
一、环境准备
1、服务器准备
总共5台机器,3台master 2台node节点
系统:CentOS Linux release 7.5 内存:8G 磁盘:50G
最小化安装
10.103.22.231 master01 haproxy keepalived
10.103.22.232 master02 haproxy keepalived
10.103.22.233 master03 haproxy keepalived
10.103.22.234 node04
10.103.22.235 node05
2、系统设置
设置主机名
hostnamectl set-hostname
修改hosts文件
cat >> /etc/hosts <
安装依赖包
yum install -y conntrack ipvsadm ipset jq iptables curl sysstat libseccomp wget vim yum-utils device-mapper-persistent-data lvm2 net-tools ntpdate telnet
设置防火墙为 Iptables 并设置空规则
systemctl stop firewalld && systemctl disable firewalld
yum -y install iptables-services && systemctl start iptables && systemctl enable iptables && iptables -F && service iptables save
关闭swap
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
关闭 SELINUX
setenforce 0 && sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
调整内核参数,对于 K8S
cat > /etc/sysctl.d/kubernetes.conf <
加载ipvs模块
cat > /etc/sysconfig/modules/ipvs.modules <
请确保ipset也已经安装了,如未安装请执行yum install -y ipset安装。
调整系统时区
# 设置系统时区为 中国/上海
timedatectl set-timezone Asia/Shanghai
# 将当前的 UTC 时间写入硬件时钟
timedatectl set-local-rtc 0
# 重启依赖于系统时间的服务
systemctl restart rsyslog
systemctl restart crond
关闭系统不需要服务
systemctl stop postfix && systemctl disable postfix
将可执行文件路径 /opt/kubernetes/bin 添加到 PATH 变量
echo 'PATH=/opt/kubernetes/bin:$PATH' >> /etc/profile.d/kubernetes.sh
source /etc/profile.d/kubernetes.sh
mkdir -p /opt/kubernetes/{bin,cert,script}
3、配置SSH免密登录
新密钥对
ssh-keygen -t rsa
分发公钥到各个节点
# 把id_rsa.pub文件内容copy到其他机器的授权文件中
cat ~/.ssh/id_rsa.pub
# 在其他节点执行下面命令(包括worker节点)
mkdir -p ~/.ssh/
echo "" >> ~/.ssh/authorized_keys
4、准备Docker环境
清理原有版本
yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
添加Docker yum源
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
安装docker
目前根据kubernetes容器运行时文档,安装如下版本:
yum install -y \
containerd.io-1.2.10 \
docker-ce-19.03.4 \
docker-ce-cli-19.03.4
docker-ce:docker服务器
docker-ce-cli:docker客户端
containerd.io:用于管理主机系统的完整容器生命周期,从映像传输和存储到容器执行和监视,再到底层存储、网络附件等等
配置docker daemon.json文件
mkdir /etc/docker
cat > /etc/docker/daemon.json <
创建docker数据目录
NODE_IPS=("master01" "master02" "master03" "node04" "node05")
for node_ip in ${NODE_IPS[@]};do
ssh root@${node_ip} "mkdir -p /data/docker/"
done
启动 Docker
systemctl enable docker
systemctl daemon-reload
systemctl start docker
二、安装kubernetes集群
1、下载etcd二进制包
etcd包下载:
cd /root/software
wget https://github.com/etcd-io/etcd/releases/download/v3.4.9/etcd-v3.4.9-linux-amd64.tar.gz
包名:
etcd-v3.4.9-linux-amd64.tar.gz
解压安装包:
mkdir -p /root/software/{master,worker}
mv etcd-v3.4.9-linux-amd64.tar.gz /root/software
cd /root/software
tar zxvf etcd-v3.4.9-linux-amd64.tar.gz
cp etcd-v3.4.9-linux-amd64/{etcd,etcdctl} master/
2、创建 CA 证书和秘钥
安装 cfssl 工具集
mkdir -p /opt/kubernetes/{bin,cert} &&cd /opt/kubernetes
mkdir -p /etc/kubernetes/pki/etcd
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
mv cfssl_linux-amd64 /opt/kubernetes/bin/cfssl
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
mv cfssljson_linux-amd64 /opt/kubernetes/bin/cfssljson
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
mv cfssl-certinfo_linux-amd64 /opt/kubernetes/bin/cfssl-certinfo
chmod +x /opt/kubernetes/bin/*
创建配置文件
cat > /etc/kubernetes/pki/etcd/ca-config.json <
创建证书签名请求文件
cat > /etc/kubernetes/pki/etcd/ca-csr.json <
生成 CA 证书和私钥
cd /etc/kubernetes/pki/etcd
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
分发根证书文件
NODE_IPS=("master01" "master02" "master03")
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p mkdir -p /etc/kubernetes/pki/etcd"
scp /etc/kubernetes/pki/etcd/ca*.pem /etc/kubernetes/pki/etcd/ca-config.json root@${node_ip}:/etc/kubernetes/pki/etcd
done
3、部署etcd集群
创建 etcd 证书和私钥
创建证书签名请求文件
cat > /etc/kubernetes/pki/etcd/etcd-csr.json <
生成证书和私钥
cd /etc/kubernetes/pki/etcd
cfssl gencert -ca=/etc/kubernetes/pki/etcd/ca.pem \
-ca-key=/etc/kubernetes/pki/etcd/ca-key.pem \
-config=/etc/kubernetes/pki/etcd/ca-config.json \
-profile=kubernetes etcd-csr.json | cfssljson -bare etcd
分发生成的证书和私钥到各 etcd 节点
NODE_IPS=("master01" "master02" "master03")
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
scp /root/software/master/etcd* root@${node_ip}:/opt/kubernetes/bin
ssh root@${node_ip} "chmod +x /opt/kubernetes/bin/*"
ssh root@${node_ip} "mkdir -p /etc/kubernetes/pki/etcd"
scp /etc/kubernetes/pki/etcd/etcd*.pem root@${node_ip}:/etc/kubernetes/pki/etcd
done
创建etcd 的systemd unit 模板
cat > /etc/kubernetes/pki/etcd/etcd.service.template <
注:
User :指定以 k8s 账户运行;
WorkingDirectory 、 --data-dir :指定工作目录和数据目录为/opt/lib/etcd ,需在启动服务前创建这个目录;
--name :指定节点名称,当 --initial-cluster-state 值为 new 时, --name 的参数值必须位于 --initial-cluster 列表中;
--cert-file 、 --key-file :etcd server 与 client 通信时使用的证书和私钥;
--trusted-ca-file :签名 client 证书的 CA 证书,用于验证 client 证书;
--peer-cert-file 、 --peer-key-file :etcd 与 peer 通信使用的证书和私钥;
--peer-trusted-ca-file :签名 peer 证书的 CA 证书,用于验证 peer 证书;
为各节点创建和分发 etcd systemd unit 文件和etcd 数据目录
#替换模板文件中的变量,为各节点创建 systemd unit 文件
NODE_NAMES=("etcd0" "etcd1" "etcd2")
NODE_IPS=("10.103.22.231" "10.103.22.232" "10.103.22.233")
for (( i=0; i < 3; i++ ));do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/g" -e "s/##NODE_IP##/${NODE_IPS[i]}/g" /etc/kubernetes/pki/etcd/etcd.service.template > /etc/kubernetes/pki/etcd/etcd-${NODE_IPS[i]}.service
done
#分发生成的 systemd unit 和etcd的配置文件:
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /opt/lib/etcd"
scp /etc/kubernetes/pki/etcd/etcd-${node_ip}.service root@${node_ip}:/etc/systemd/system/etcd.service
done
启动 etcd 服务
vim /opt/kubernetes/script/etcd.sh
NODE_IPS=("10.103.22.231" "10.103.22.232" "10.103.22.233")
#启动 etcd 服务
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd"
done
#检查启动结果,确保状态为 active (running)
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status etcd|grep Active"
done
#验证服务状态,输出均为healthy 时表示集群服务正常
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ETCDCTL_API=3 /opt/kubernetes/bin/etcdctl \
--endpoints=https://${node_ip}:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.pem \
--cert=/etc/kubernetes/pki/etcd/etcd.pem \
--key=/etc/kubernetes/pki/etcd/etcd-key.pem endpoint health
done
显示信息:
>>> 10.103.22.231
https://10.103.22.231:2379 is healthy: successfully committed proposal: took = 14.831695ms
>>> 10.103.22.232
https://10.103.22.232:2379 is healthy: successfully committed proposal: took = 21.961696ms
>>> 10.103.22.233
https://10.103.22.233:2379 is healthy: successfully committed proposal: took = 20.714393ms
4、安装负载均衡器
3台master节点安装 keepalived haproxy
yum install -y keepalived haproxy
配置haproxy 配置文件
vim /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /var/run/haproxy-admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
nbproc 1
defaults
log global
timeout connect 5000
timeout client 10m
timeout server 10m
listen admin_stats
bind 0.0.0.0:10080
mode http
log 127.0.0.1 local0 err
stats refresh 30s
stats uri /status
stats realm welcome login\ Haproxy
stats auth along:along123
stats hide-version
stats admin if TRUE
listen kube-master
bind 0.0.0.0:8443
mode tcp
option tcplog
balance source
server master01 10.103.22.231:6443 check inter 2000 fall 2 rise 2 weight 1
server master02 10.103.22.232:6443 check inter 2000 fall 2 rise 2 weight 1
server master03 10.103.22.233:6443 check inter 2000 fall 2 rise 2 weight 1
注:
haproxy 在 10080 端口输出 status 信息;
haproxy 监听所有接口的 8443 端口,该端口与环境变量 ${KUBE_APISERVER} 指定的端口必须一致;
server 字段列出所有kube-apiserver监听的 IP 和端口;
下发haproxy 配置文件;并启动检查haproxy服务
vim /opt/kubernetes/script/haproxy.sh
NODE_IPS=("master01" "master02" "master03")
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
#下发配置文件
scp /etc/haproxy/haproxy.cfg root@${node_ip}:/etc/haproxy
#启动检查haproxy服务
ssh root@${node_ip} "systemctl restart haproxy"
ssh root@${node_ip} "systemctl enable haproxy.service"
ssh root@${node_ip} "systemctl status haproxy|grep Active"
#检查 haproxy 是否监听8443 端口
ssh root@${node_ip} "netstat -lnpt|grep haproxy"
done
确保输出类似于:
tcp 0 0 0.0.0.0:10080 0.0.0.0:* LISTEN 20027/haproxy
tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 20027/haproxy
配置和启动 keepalived 服务
keepalived 多备(backup)运行模式,使用非抢占模式
这样配置防止master出现故障在恢复的时候立即抢占VIP (apiserver启动提供服务需要时间,防止在apiserver未提供服务期间有请求调度过来)
backup:10.103.22.231、10.103.22.232、10.103.22.233
设置keeplived配置文件:
vim /etc/keepalived/keepalived.conf
global_defs {
router_id keepalived_hap
}
vrrp_script check-haproxy {
script "killall -0 haproxy"
interval 5
weight -5
}
vrrp_instance VI-kube-master {
state BACKUP
nopreempt
priority 200
dont_track_primary
interface ens160
virtual_router_id 68
advert_int 3
track_script {
check-haproxy
}
virtual_ipaddress {
10.103.22.236
}
}
注:
我的VIP 所在的接口nterface 为 eth1;根据自己的情况改变
使用 killall -0 haproxy 命令检查所在节点的 haproxy 进程是否正常。如果异常则将权重减少(-30),从而触发重新选主过程;
router_id、virtual_router_id 用于标识属于该 HA 的 keepalived 实例,如果有多套keepalived HA,则必须各不相同;
重点:
1、三个节点的state都必须配置为BACKUP
2、三个节点都必须加上配置 nopreempt
3、其中一个节点的优先级必须要高于另外两个节点的优先级
4、在master02和master03的配置中只需将priority 200调成150和100即可
开启keepalived 服务
NODE_IPS=("master01" "master02" "master03")
VIP="10.103.22.236"
for node_ip in ${NODE_IPS[@]};do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl restart keepalived && systemctl enable keepalived"
ssh root@${node_ip} "systemctl status keepalived|grep Active"
ssh ${node_ip} "ping -c 3 ${VIP}"
done
查看网卡ip:ip a show ens160
2: ens160:
link/ether 00:50:56:a0:ed:1f brd ff:ff:ff:ff:ff:ff
inet 10.103.22.231/24 brd 10.103.22.255 scope global noprefixroute ens160
valid
inet 10.103.22.236/32 scope global ens160
valid
5、安装kubeadm 和 kubelet
注: 只在master、node节点上进行以下操作。
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes Repository
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
gpgcheck=1
gpgkey=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
EOF
yum install -y kubelet-1.17.8 kubeadm-1.17.8 kubectl-1.17.8
# 设置开机自启动kubelet
systemctl enable kubelet
kubelet负责与其他节点集群通信,并进行本节点Pod和容器生命周期的管理。kubeadm是Kubernetes的自动化部署工具,降低了部署难度,提高效率。kubectl是Kubernetes集群客户端管理工具。
kubect可以在只需要使用的机器上安装即可。
6、部署kubernetes master
以下内容是根据kubeadm config print init-defaults指令打印出来的,并根据自己需求手动修改后的结果。
kubeadm config view
kubeadm config print init-defaults > kubeadm-init.yaml.yaml
cat kubeadm-init.yaml.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.17.8
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
controlPlaneEndpoint: 10.103.22.236:8443
apiServer:
certSANs:
- 10.103.22.231
- 10.103.22.232
- 10.103.22.233
- 10.103.22.236
- 127.0.0.1
etcd:
external:
endpoints:
- https://10.103.22.231:2379
- https://10.103.22.232:2379
- https://10.103.22.233:2379
caFile: /etc/kubernetes/pki/etcd/ca.pem
certFile: /etc/kubernetes/pki/etcd/etcd.pem
keyFile: /etc/kubernetes/pki/etcd/etcd-key.pem
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
其中中配置的是所有要和apiserver交互的地址,包括VIP。 配置的是外部etcd集群,其中也指定了etcd证书路径,这就是为什么etcd的证书要复制到kubernetes所有master节点的原因。
如果有网络环境需要配置一下docker代理:
mkdir -p /etc/systemd/system/docker.service.d
vim /etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://IP:port" "HTTPS_PROXY=IP:port" "NO_PROXY=localhost,127.0.0.1,registry.aliyuncs.com/google_containers"
#可以使用以下方式提前下载好镜像,然后再初始化
kubeadm config images pull --config kubeadm-init.yaml
# 初始化
kubeadm init --config=kubeadm-init.yaml --upload-certs | tee kubeadm-init.log
#如果有报错报错,报错是由于使用二进制部署的etcd服务,kubeadm初始化会检查一下端口
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
#使用如下方式初始化即可
kubeadm init --config=kubeadm-init.yaml --upload-certs --ignore-preflight-errors=Port-2379 --ignore-preflight-errors=Port-2380 | tee kubeadm-init.log
#正常情况下的输出日志:
[init] Using Kubernetes version: v1.17.8
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.103.22.231 10.103.22.236 10.103.22.231 10.103.22.232 10.103.22.233 10.103.22.236 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate generation
[certs] External etcd mode: Skipping etcd/peer certificate generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0713 10:00:44.006478 22455 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0713 10:00:44.008302 22455 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 39.022851 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.17" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[kubelet-check] Initial timeout of 40s passed.
[upload-certs] Using certificate key:
d5f08ec8a45b09cc4d8c7122064503f57ae1b76fa179499a22111ce667c466cd
[mark-control-plane] Marking the node master01 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: ixbybk.ybid1swqmqjvs9w4
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key d5f08ec8a45b09cc4d8c7122064503f57ae1b76fa179499a22111ce667c466cd
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf
当你不能科学上网时,可以手动从别的镜像仓库中下载,这样也能达到我们的要求,如下操作:
#查看需要镜像名称
kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.17.8
k8s.gcr.io/kube-controller-manager:v1.17.8
k8s.gcr.io/kube-scheduler:v1.17.8
k8s.gcr.io/kube-proxy:v1.17.8
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.5
配置执行kubectl命令用户
#kubernetes建议使用非root用户运行kubectl命令访问集群
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
安装calico网络插件
使用etcd数据存储安装Calico
1、下载etcd的Calico网络清单。
curl https://docs.projectcalico.org/manifests/calico-etcd.yaml -o calico.yaml
2、根据环境调整配置
vim calico.yaml
#Secret 对象调整,填入命令获取的值
#etcd-key: (cat /etc/kubernetes/pki/etcd/etcd-key.pem | base64 | tr -d '\n')
#etcd-cert: (cat /etc/kubernetes/pki/etcd/etcd.pem | base64 | tr -d '\n')
#etcd-ca: (cat /etc/kubernetes/pki/etcd/ca.pem | base64 | tr -d '\n')
#ConfigMap对象调整
data:
# Configure this with the location of your etcd cluster.
#配置etcd集群的地址
etcd_endpoints: "https://10.103.22.231:2379,https://10.103.22.231:2379,https://10.103.22.231:2379"
#ConfigMap 对象调整,去掉注释
etcd_ca: "/calico-secrets/etcd-ca" # "/calico-secrets/etcd-ca"
etcd_cert: "/calico-secrets/etcd-cert" # "/calico-secrets/etcd-cert"
etcd_key: "/calico-secrets/etcd-key" # "/calico-secrets/etcd-key"
#DaemonSet 对象调整
- name: CALICO_IPV4POOL_CIDR
value: "10.98.0.0/16"
#此处会默认拿kuber-apiservice的service的clusterip地址
#也可以用apiserver的vip地址(keepalived的vip地址,haproxy代理)
#- name: KUBERNETES_SERVICE_HOST
# value: "10.103.22.236"
#- name: KUBERNETES_SERVICE_PORT
# value: "8443"
#- name: KUBERNETES_SERVICE_PORT_HTTPS
# value: "8443"
#安装calico
kubectl apply -f calico.yaml
7、master2和master3加入集群
kubernetes证书分发
复制master1生成的证书文件到master2和master3,因为kubernetes使用证书配置集群,所以其它节点要使用同一套证书。
# 在master1上操作
#这里还有etcd证书,因为在配置etcd的时候我们已经复制过去了,所以这里不用在复制了。
cd /etc/kubernetes/pki/
scp ca.* sa.* front-proxy-ca.* master02:/etc/kubernetes/pki/
scp ca.* sa.* front-proxy-ca.* master03:/etc/kubernetes/pki/
scp /etc/kubernetes/admin.conf master01:/etc/kubernetes/
scp /etc/kubernetes/admin.conf master02:/etc/kubernetes/
手动载入组件镜像
为了节约安装时间,我们这里手动将master1上的组件镜像载入到master2和master3上。如果不手动载入也可以,时间比较久而已。当然如果是k8s.gcr.io的仓库地址的话记得要科学上网。
docker save k8s.gcr.io/kube-proxy:v1.17.8 k8s.gcr.io/kube-controller-manager:v1.17.8 k8s.gcr.io/kube-apiserver:v1.17.8 k8s.gcr.io/kube-scheduler:v1.17.8 k8s.gcr.io/coredns:1.6.5 k8s.gcr.io/pause:3.1 calico/node:v3.15.1 calico/pod2daemon-flexvol:v3.15.1 calico/cni:v3.15.1 calico/kube-controllers:v3.15.1 -o k8s-masterimages-v1.17.8.tar
scp k8s-masterimages-v1.17.8.tar master02:/root
scp k8s-masterimages-v1.17.8.tar master03:/root
# 在master2和master3上手动加载镜像
docker load -i k8s-masterimages-v1.17.8.tar
docker images
加入集群
使用kubeadm安装的集群要让一个新节点加入集群那是相当简单了,如下:
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key d5f08ec8a45b09cc4d8c7122064503f57ae1b76fa179499a22111ce667c466cd
#报错error execution phase control-plane-prepare/download-certs: error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run `kubeadm init phase upload-certs --upload-certs` on a control plane to generate a new one
#重新生成证书
kubeadm init phase upload-certs --upload-certs
210c2cbf748afcbc17d72c0f2d0a57fd407f21e1f453b14f6f6e29ed9099eb71
#再次加入
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key 210c2cbf748afcbc17d72c0f2d0a57fd407f21e1f453b14f6f6e29ed9099eb71
#继续报错
error execution phase control-plane-prepare/download-certs: error downloading certs: the Secret does not include the required certificate or key - name: external-etcd-ca.crt, path: /etc/kubernetes/pki/etcd/ca.pem
#分析由于etcd是外部部署的,重新生成秘钥的时候需要带上kubeadm-init.yaml文件
kubeadm init phase upload-certs --upload-certs --config kubeadm-init.yaml
b206fb1b860569770acb33f0eb717d9cc0b95304d398023399b4e2e4ec588350
#再再次加入,成功
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf \
--control-plane --certificate-key b206fb1b860569770acb33f0eb717d9cc0b95304d398023399b4e2e4ec588350
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
至此master2和master3以master身份加入集群成功,而参数是主要参数。
配置执行kubectl命令用户
#kubernetes建议使用非root用户运行kubectl命令访问集群
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
8、node节点加入集群
kubeadm join 10.103.22.236:8443 --token ixbybk.ybid1swqmqjvs9w4 \
--discovery-token-ca-cert-hash sha256:17b90f70dd3ea6bc935080f5a6648e3eff32c94cef1182c1c28592b2222691cf
查看节点数及状态
kubectl get node
NAME STATUS ROLES AGE VERSION
master01 Ready master 25h v1.17.8
master02 Ready master 17h v1.17.8
master03 Ready master 17h v1.17.8
node04 Ready 101m v1.17.8
node05 Ready 105m v1.17.8
查看集群组件状态
kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
9、验证集群
通过上面的步骤,我们已经完成了集群的配置安装,现在我们要验证高可用集群的特性,看看是否能高可用,要验证的有:
停掉当前已选举的master来验证组件是否会重新选举。
停掉某个etcd来验证etcd的集群是否可用。
停掉vip地址所在的主机服务,验证vip是否会偏移到另外一台HA上并且集群可用。
验证master选举高可用
要验证masrer集群是否高可用,我们先来查看当前各组件选举在哪台master节点上。
查看kube-controller-manager服务
kubectl get endpoints kube-controller-manager -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master02_e4a4ca3d-a5f3-4531-b01c-37ab4758e5dc","leaseDurationSeconds":15,"acquireTime":"2020-07-14T01:25:37Z","renewTime":"2020-07-14T03:11:53Z","leaderTransitions":5}'
creationTimestamp: "2020-07-13T02:01:22Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "223470"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
uid: 5d3c1653-919d-4765-ab2f-82e47b623416
#目前看leader 在master02节点
查看kube-scheduler服务
kubectl get endpoints kube-scheduler -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master02_d0fd3a81-3b13-421a-81a8-3eb0220ea67c","leaseDurationSeconds":15,"acquireTime":"2020-07-14T01:25:36Z","renewTime":"2020-07-14T03:17:12Z","leaderTransitions":5}'
creationTimestamp: "2020-07-13T02:01:21Z"
name: kube-scheduler
namespace: kube-system
resourceVersion: "224353"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
uid: f345b2a4-970e-46cb-93b8-790d678d0201
#目前看leader 在master02节点
查看下来kuber-controller-mamager和kube-scheduler都在master02节点上,现在将master02节点关机再次查看。
关机后查看kube-controller-manager服务
kubectl get endpoints kube-controller-manager -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master03_369df996-06d1-4bf1-a6a4-38e5dd4a404f","leaseDurationSeconds":15,"acquireTime":"2020-07-14T03:21:44Z","renewTime":"2020-07-14T03:22:04Z","leaderTransitions":6}'
creationTimestamp: "2020-07-13T02:01:22Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "225130"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
uid: 5d3c1653-919d-4765-ab2f-82e47b623416
#目前看leader 在master03节点
关机后查看kube-scheduler服务
kubectl get endpoints kube-scheduler -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master03_94056bae-7383-4104-ab54-d80ea4971a92","leaseDurationSeconds":15,"acquireTime":"2020-07-14T03:21:41Z","renewTime":"2020-07-14T03:21:54Z","leaderTransitions":6}'
creationTimestamp: "2020-07-13T02:01:21Z"
name: kube-scheduler
namespace: kube-system
resourceVersion: "225100"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
uid: f345b2a4-970e-46cb-93b8-790d678d0201
#目前看leader 在master03节点
由上可以看出kuber-controller-mamager和kube-scheduler都切换到master03节点,kubernetes高可用验证没有问题。
验证etcd集群高可用
我们已经知道,kubernetes所有操作的配置以及组件维护的状态都要存储在etcd中,当etcd不能用时,整个kubernetes集群也不能正常工作了。那么我们关掉etcd2节点来测试(由于刚才停掉了master02节点所以用etcd2节点来验证)。
查看当前组件监控状态:
查看集群组件状态
kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
将master02节点关机,etcd2也会停掉,查看etcd状态
/opt/kubernetes/bin/etcdctl \
--endpoints=https://10.103.22.232:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.pem \
--cert=/etc/kubernetes/pki/etcd/etcd.pem \
--key=/etc/kubernetes/pki/etcd/etcd-key.pem endpoint health
{"level":"warn","ts":"2020-07-14T13:20:52.166+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-38b567a9-b35b-4069-9d54-ba130f5c1cf1/10.103.22.232:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.103.22.232:2379: connect: no route to host\""}
https://10.103.22.232:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
#已经不可达
已经很明显的看到了etcd1这台服务已经不可达了
回到kubernetes master1节点上,我们再次查看组件健康状态
kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Unhealthy Get https://10.103.22.232:2379/health: dial tcp 10.103.22.232:2379: connect: no route to host
我们也可以通过kubernetes查看到etcd1已经连接失败了
现在查看一下kubernetes集群会受影响吗
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready master 27h v1.17.8
master02 NotReady master 19h v1.17.8
master03 Ready master 19h v1.17.8
node04 Ready 4h2m v1.17.8
node05 Ready 4h6m v1.17.8
可以看到etcd1节点服务宕机后,我们的kubernetes依然可以用,因此etcd集群也是高可用的。
验证vip地址漂移
我们都知道我们的集群apiserver是通过haproxy的vip地址做反向代理的。可以通过以下命令查看
kubectl cluster-info
Kubernetes master is running at https://10.103.22.236:8443
KubeDNS is running at https://10.103.22.236:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
查看vip地址
ip addr
2: ens160: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a0:ed:1f brd ff:ff:ff:ff:ff:ff
inet 10.103.22.231/24 brd 10.103.22.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 10.103.22.236/32 scope global ens160
valid_lft forever preferred_lft forever
可以看到vip(10.103.22.236)在master01(10.103.22.231)上,直接将master01服务器关机
验证vip(haproxy的vip地址)的漂移和kubernetes集群是否可用
查看vip地址已经漂移到master02(10.103.22.232)
ip add
2: ens160: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a0:e3:b1 brd ff:ff:ff:ff:ff:ff
inet 10.103.22.232/24 brd 10.103.22.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 10.103.22.236/32 scope global ens160
valid_lft forever preferred_lft forever
验证kubernetes集群是否可用
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 NotReady master 27h v1.17.8
master02 Ready master 20h v1.17.8
master03 Ready master 20h v1.17.8
node04 Ready 4h10m v1.17.8
node05 Ready 4h14m v1.17.8
显示我们的kubectl客户端依然可以通过VIP访问kubernetes集群。因此HA高可用验证成功。