[kube 022] 混沌测试框架-Litmus
[kube 022] 混沌测试框架-Litmus
是进行云原生混沌工程的开源工具集。提供了一些工具来协调上的混乱情况,以帮助发现其部署中的弱点。最初在环境中使用进行混乱的实验,最终在中使用它们来发现错误和漏洞。修复弱点可提高系统的弹性。
采用云原生方法来创建,管理和监视混乱。使用以下编排混沌:
ChaosEngine:将应用程序或节点链接到的资源。的监视,然后调用
ChaosExperiment:用于分组混沌实验的配置参数的资源。当调用实验时,由创建。
ChaosResult:用于保存混沌实验结果的资源。读取结果并将度量导出到已配置的服务器中。
混沌实验位于上。它是应用程序开发人员和云厂商共享混乱实验的中央枢纽,以便他们的用户可以使用它们来提高应用程序在生产中的弹性。
在本文中,我们将运行一些混乱的实验来验证系统的弹性。
准备工作
请准备好一个集群以及链接该集群的和。
操作记录
主要分为以下过程:
安装Litmus Operator
使用Chaos Charts
创建Pod删除混沌实验
查看混沌的实验结果
查看混沌的实验日志
安装 Litmus Operator
让我们执行以下步骤在集群中安装Litmus:
❯ kubectl apply -f "https://litmuschaos.github.io/pages/litmus-operator-latest.yaml"
namespace/litmus created
serviceaccount/litmus created
clusterrole.rbac.authorization.k8s.io/litmus created
clusterrolebinding.rbac.authorization.k8s.io/litmus created
deployment.apps/chaos-operator-ce created
customresourcedefinition.apiextensions.k8s.io/chaosengines.litmuschaos.io created
customresourcedefinition.apiextensions.k8s.io/chaosexperiments.litmuschaos.io created
customresourcedefinition.apiextensions.k8s.io/chaosresults.litmuschaos.io created
❯ kubectl get pods -n litmus
NAME READY STATUS RESTARTS AGE
chaos-operator-ce-7c76fc797f-7nm42 1/1 Running 0 67s
❯ kubectl get crds -n litmus
chaosengines.litmuschaos.io 2020-06-05T13:08:05Z
chaosexperiments.litmuschaos.io 2020-06-05T13:08:05Z
chaosresults.litmuschaos.io 2020-06-05T13:08:05Z
❯ kubectl api-resources | grep chaos
chaosengines litmuschaos.io true ChaosEngine
chaosexperiments litmuschaos.io true ChaosExperiment
chaosresults litmuschaos.io true ChaosResult
❯ kubectl get clusterroles,clusterrolebinding | grep "litmus\|chaos"
clusterrole.rbac.authorization.k8s.io/litmus 2020-06-05T13:08:05Z
clusterrolebinding.rbac.authorization.k8s.io/litmus ClusterRole/litmus 6m39s
现在,我们在集群中已经正常运行了。接下来,我们需要部署混乱的实验来测试集群资源的弹性。
使用 Chaos Charts
用于安装混沌实验包。混沌实验包含实际的混沌细节。让我们执行以下步骤为安装:
❯ kubectl create namespace nginx
namespace/nginx created
❯ kubectl apply -f "https://hub.litmuschaos.io/api/chaos/1.4.0\?file\=charts/generic/experiments.yaml" -n nginx
chaosexperiment.litmuschaos.io/node-drain created
chaosexperiment.litmuschaos.io/disk-fill created
chaosexperiment.litmuschaos.io/pod-cpu-hog created
chaosexperiment.litmuschaos.io/pod-memory-hog created
chaosexperiment.litmuschaos.io/pod-network-corruption created
chaosexperiment.litmuschaos.io/pod-delete created
chaosexperiment.litmuschaos.io/pod-network-loss created
chaosexperiment.litmuschaos.io/disk-loss created
chaosexperiment.litmuschaos.io/pod-network-latency created
chaosexperiment.litmuschaos.io/node-cpu-hog created
chaosexperiment.litmuschaos.io/node-memory-hog created
chaosexperiment.litmuschaos.io/container-kill created
❯ kubectl get chaosexperiments -n nginx
NAME AGE
container-kill 4m6s
disk-fill 4m6s
disk-loss 4m6s
node-cpu-hog 4m6s
node-drain 4m6s
node-memory-hog 4m6s
pod-cpu-hog 4m6s
pod-delete 4m6s
pod-memory-hog 4m6s
pod-network-corruption 4m6s
pod-network-latency 4m6s
pod-network-loss 4m6s
通用混沌图表下提供了混沌实验方案,如删除,网络延迟,网络丢失和容器销毁。也可以安装或构建自己的特定于应用程序的混沌图以运行特定于应用程序的混沌。
创建 Pod 删除混沌实验
我们将部署一个示例应用程序,并对该应用程序进行混沌实验。让我们执行以下步骤来测试删除对集群的影响:
❯ cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx-deployment
namespace: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
❯ kubectl apply -f nginx.yaml
deployment.apps/nginx-deployment created
❯ kubectl get pod -n nginx
NAME READY STATUS RESTARTS AGE
nginx-deployment-558fc78868-269v5 1/1 Running 0 99s
nginx-deployment-558fc78868-cblpc 1/1 Running 0 99s
❯ kubectl annotate deploy nginx-deployment litmuschaos.io/chaos="true" -n nginx
deployment.apps/nginx-deployment annotated
注意:
支持对,和进行混乱。
$ cat <
❯ kubectl get chaosexperiment pod-delete -o yaml -n nginx
apiVersion: litmuschaos.io/v1alpha1
description:
message: |
Deletes a pod belonging to a deployment/statefulset/daemonset
kind: ChaosExperiment
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"litmuschaos.io/v1alpha1","description":{"message":"Deletes a pod belonging to a deployment/statefulset/daemonset\n"},"kind":"ChaosExperiment","metadata":{"annotations":{},"name":"pod-delete","namespace":"nginx","version":"0.1.13"},"spec":{"definition":{"args":["-c","ansible-playbook ./experiments/generic/pod_delete/pod_delete_ansible_logic.yml -i /etc/ansible/hosts -vv; exit 0"],"command":["/bin/bash"],"env":[{"name":"ANSIBLE_STDOUT_CALLBACK","value":"default"},{"name":"TOTAL_CHAOS_DURATION","value":"15"},{"name":"RAMP_TIME","value":""},{"name":"KILL_COUNT","value":""},{"name":"FORCE","value":"true"},{"name":"CHAOS_INTERVAL","value":"5"},{"name":"LIB","value":""}],"image":"litmuschaos/ansible-runner:1.4.0","labels":{"name":"pod-delete"},"permissions":[{"apiGroups":["","apps","batch","litmuschaos.io"],"resources":["deployments","jobs","pods","pods/log","events","configmaps","chaosengines","chaosexperiments","chaosresults"],"verbs":["create","list","get","patch","update","delete"]},{"apiGroups":[""],"resources":["nodes"],"verbs":["get","list"]}],"scope":"Namespaced"}}}
creationTimestamp: "2020-06-05T13:22:17Z"
generation: 1
managedFields:
- apiVersion: litmuschaos.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:description:
.: {}
f:message: {}
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
.: {}
f:definition:
.: {}
f:args: {}
f:command: {}
f:env: {}
f:image: {}
f:labels:
.: {}
f:name: {}
f:permissions: {}
f:scope: {}
manager: kubectl
operation: Update
time: "2020-06-05T13:22:17Z"
name: pod-delete
namespace: nginx
resourceVersion: "3465"
selfLink: /apis/litmuschaos.io/v1alpha1/namespaces/nginx/chaosexperiments/pod-delete
uid: 1ea49dc0-2e58-41ff-9953-2e4844702aaa
spec:
definition:
args:
- -c
- ansible-playbook ./experiments/generic/pod_delete/pod_delete_ansible_logic.yml
-i /etc/ansible/hosts -vv; exit 0
command:
- /bin/bash
env:
- name: ANSIBLE_STDOUT_CALLBACK
value: default
- name: TOTAL_CHAOS_DURATION
value: "15"
- name: RAMP_TIME
value: ""
- name: KILL_COUNT
value: ""
- name: FORCE
value: "true"
- name: CHAOS_INTERVAL
value: "5"
- name: LIB
value: ""
image: litmuschaos/ansible-runner:1.4.0
labels:
name: pod-delete
permissions:
- apiGroups:
- ""
- apps
- batch
- litmuschaos.io
resources:
- deployments
- jobs
- pods
- pods/log
- events
- configmaps
- chaosengines
- chaosexperiments
- chaosresults
verbs:
- create
- list
- get
- patch
- update
- delete
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
scope: Namespaced
cat <
查看混沌的实验结果
混沌实验是作为Kubernetes作业执行的,受影响的豆荚将由混沌执行者根据实验定义删除。
让我们执行以下步骤来回顾我们的混沌实验的结果:
$ watch -n 1 kubectl get pods -n nginx
Every 1.0s: kubectl get pods -n nginx 192.168.1.102: Sat Jun 6 01:35:22 2020
NAME READY STATUS RESTARTS AGE
nginx-chaos-runner 1/1 Running 0 31s
nginx-deployment-558fc78868-f4tcd 0/1 Terminating 0 3m54s
nginx-deployment-558fc78868-g6wjm 0/1 ContainerCreating 0 1s
nginx-deployment-558fc78868-wbzd2 1/1 Running 0 3m38s
pod-delete-xb472u-rvjc8 1/1 Running 0 24s
❯ kubectl get chaosresults -n nginx
NAME AGE
nginx-chaos-pod-delete 11m
❯ kubectl describe chaosresults nginx-chaos-pod-delete -n nginx
Name: nginx-chaos-pod-delete
Namespace: nginx
Labels: chaosUID=7181dd32-dcd2-44c8-b9a1-62f76b4426d4
type=ChaosResult
Annotations: API Version: litmuschaos.io/v1alpha1
Kind: ChaosResult
Metadata:
Creation Timestamp: 2020-06-05T17:25:49Z
Generation: 6
Managed Fields:
API Version: litmuschaos.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:labels:
.:
f:chaosUID:
f:type:
f:spec:
.:
f:engine:
f:experiment:
f:status:
.:
f:experimentstatus:
.:
f:failStep:
f:phase:
f:verdict:
Manager: kubectl
Operation: Update
Time: 2020-06-05T17:37:39Z
Resource Version: 8339
Self Link: /apis/litmuschaos.io/v1alpha1/namespaces/nginx/chaosresults/nginx-chaos-pod-delete
UID: bffa195d-4bf5-47a3-9a6c-7f2287107ea5
Spec:
Engine: nginx-chaos
Experiment: pod-delete
Status:
Experimentstatus:
Fail Step: N/A
Phase: Completed
Verdict: Pass
Events:
查看混沌的实验事件
可以查看指定命名空间下的来了解和还原我们的混沌实验:
❯ kubectl get events -n nginx --sort-by='{.lastTimestamp}'
LAST SEEN TYPE REASON OBJECT MESSAGE
13m Normal ChaosInject chaosengine/nginx-chaos Injecting pod-delete chaos on nginx-deployment-558fc78868-s26cl pod
13m Normal Scheduled pod/nginx-deployment-558fc78868-sgswp Successfully assigned nginx/nginx-deployment-558fc78868-sgswp to minikube
13m Normal Killing pod/nginx-deployment-558fc78868-s26cl Stopping container nginx
13m Normal SuccessfulCreate replicaset/nginx-deployment-558fc78868 (combined from similar events): Created pod: nginx-deployment-558fc78868-sgswp
13m Normal Pulled pod/nginx-deployment-558fc78868-sgswp Container image "nginx" already present on machine
13m Normal Started pod/nginx-deployment-558fc78868-sgswp Started container nginx
13m Normal Created pod/nginx-deployment-558fc78868-sgswp Created container nginx
12m Normal PostChaosCheck chaosengine/nginx-chaos AUT is Running successfully
12m Normal Summary chaosengine/nginx-chaos pod-delete Experiment Passed!
12m Normal Completed job/pod-delete-xb472u Job completed
12m Normal ExperimentJobCleanUp chaosengine/nginx-chaos Experiment Job 'pod-delete-xb472u' is deleted
12m Normal Killing pod/nginx-chaos-runner Stopping container chaos-runner
12m Normal ChaosEngineCompleted chaosengine/nginx-chaos Chaos Engine completed, will delete or retain the resources according to jobCleanUpPolicy