Bootstrap

[kube 022] 混沌测试框架-Litmus

[kube 022] 混沌测试框架-Litmus

是进行云原生混沌工程的开源工具集。提供了一些工具来协调上的混乱情况,以帮助发现其部署中的弱点。最初在环境中使用进行混乱的实验,最终在中使用它们来发现错误和漏洞。修复弱点可提高系统的弹性。

采用云原生方法来创建,管理和监视混乱。使用以下编排混沌:

  • ChaosEngine:将应用程序或节点链接到的资源。的监视,然后调用

  • ChaosExperiment:用于分组混沌实验的配置参数的资源。当调用实验时,由创建。

  • ChaosResult:用于保存混沌实验结果的资源。读取结果并将度量导出到已配置的服务器中。

混沌实验位于上。它是应用程序开发人员和云厂商共享混乱实验的中央枢纽,以便他们的用户可以使用它们来提高应用程序在生产中的弹性。

在本文中,我们将运行一些混乱的实验来验证系统的弹性。

准备工作

请准备好一个集群以及链接该集群的和。

操作记录

主要分为以下过程:

  • 安装Litmus Operator

  • 使用Chaos Charts

  • 创建Pod删除混沌实验

  • 查看混沌的实验结果

  • 查看混沌的实验日志

安装 Litmus Operator

让我们执行以下步骤在集群中安装Litmus

❯ kubectl apply -f "https://litmuschaos.github.io/pages/litmus-operator-latest.yaml"
namespace/litmus created
serviceaccount/litmus created
clusterrole.rbac.authorization.k8s.io/litmus created
clusterrolebinding.rbac.authorization.k8s.io/litmus created
deployment.apps/chaos-operator-ce created
customresourcedefinition.apiextensions.k8s.io/chaosengines.litmuschaos.io created
customresourcedefinition.apiextensions.k8s.io/chaosexperiments.litmuschaos.io created
customresourcedefinition.apiextensions.k8s.io/chaosresults.litmuschaos.io created
❯ kubectl get pods -n litmus
NAME                                 READY   STATUS    RESTARTS   AGE
chaos-operator-ce-7c76fc797f-7nm42   1/1     Running   0          67s
❯ kubectl get crds -n litmus
chaosengines.litmuschaos.io       2020-06-05T13:08:05Z
chaosexperiments.litmuschaos.io   2020-06-05T13:08:05Z
chaosresults.litmuschaos.io       2020-06-05T13:08:05Z
❯ kubectl api-resources | grep chaos
chaosengines                                   litmuschaos.io                 true         ChaosEngine
chaosexperiments                               litmuschaos.io                 true         ChaosExperiment
chaosresults                                   litmuschaos.io                 true         ChaosResult
❯ kubectl get clusterroles,clusterrolebinding  | grep "litmus\|chaos"
clusterrole.rbac.authorization.k8s.io/litmus                                                                 2020-06-05T13:08:05Z
clusterrolebinding.rbac.authorization.k8s.io/litmus                                                 ClusterRole/litmus                                                                 6m39s

现在,我们在集群中已经正常运行了。接下来,我们需要部署混乱的实验来测试集群资源的弹性。

使用 Chaos Charts

用于安装混沌实验包。混沌实验包含实际的混沌细节。让我们执行以下步骤为安装:

❯ kubectl create namespace nginx
namespace/nginx created
❯ kubectl apply -f "https://hub.litmuschaos.io/api/chaos/1.4.0\?file\=charts/generic/experiments.yaml" -n nginx
chaosexperiment.litmuschaos.io/node-drain created
chaosexperiment.litmuschaos.io/disk-fill created
chaosexperiment.litmuschaos.io/pod-cpu-hog created
chaosexperiment.litmuschaos.io/pod-memory-hog created
chaosexperiment.litmuschaos.io/pod-network-corruption created
chaosexperiment.litmuschaos.io/pod-delete created
chaosexperiment.litmuschaos.io/pod-network-loss created
chaosexperiment.litmuschaos.io/disk-loss created
chaosexperiment.litmuschaos.io/pod-network-latency created
chaosexperiment.litmuschaos.io/node-cpu-hog created
chaosexperiment.litmuschaos.io/node-memory-hog created
chaosexperiment.litmuschaos.io/container-kill created
❯ kubectl get chaosexperiments -n nginx
NAME                     AGE
container-kill           4m6s
disk-fill                4m6s
disk-loss                4m6s
node-cpu-hog             4m6s
node-drain               4m6s
node-memory-hog          4m6s
pod-cpu-hog              4m6s
pod-delete               4m6s
pod-memory-hog           4m6s
pod-network-corruption   4m6s
pod-network-latency      4m6s
pod-network-loss         4m6s

通用混沌图表下提供了混沌实验方案,如删除,网络延迟,网络丢失和容器销毁。也可以安装或构建自己的特定于应用程序的混沌图以运行特定于应用程序的混沌。

创建 Pod 删除混沌实验

我们将部署一个示例应用程序,并对该应用程序进行混沌实验。让我们执行以下步骤来测试删除对集群的影响:

❯ cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-deployment
  namespace: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          protocol: TCP
❯ kubectl apply -f nginx.yaml
deployment.apps/nginx-deployment created
❯ kubectl get pod -n nginx
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-558fc78868-269v5   1/1     Running   0          99s
nginx-deployment-558fc78868-cblpc   1/1     Running   0          99s
❯ kubectl annotate deploy nginx-deployment litmuschaos.io/chaos="true" -n nginx
deployment.apps/nginx-deployment annotated

注意:

支持对,和进行混乱。

$ cat <
❯ kubectl get chaosexperiment pod-delete -o yaml -n nginx
apiVersion: litmuschaos.io/v1alpha1
description:
  message: |
    Deletes a pod belonging to a deployment/statefulset/daemonset
kind: ChaosExperiment
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"litmuschaos.io/v1alpha1","description":{"message":"Deletes a pod belonging to a deployment/statefulset/daemonset\n"},"kind":"ChaosExperiment","metadata":{"annotations":{},"name":"pod-delete","namespace":"nginx","version":"0.1.13"},"spec":{"definition":{"args":["-c","ansible-playbook ./experiments/generic/pod_delete/pod_delete_ansible_logic.yml -i /etc/ansible/hosts -vv; exit 0"],"command":["/bin/bash"],"env":[{"name":"ANSIBLE_STDOUT_CALLBACK","value":"default"},{"name":"TOTAL_CHAOS_DURATION","value":"15"},{"name":"RAMP_TIME","value":""},{"name":"KILL_COUNT","value":""},{"name":"FORCE","value":"true"},{"name":"CHAOS_INTERVAL","value":"5"},{"name":"LIB","value":""}],"image":"litmuschaos/ansible-runner:1.4.0","labels":{"name":"pod-delete"},"permissions":[{"apiGroups":["","apps","batch","litmuschaos.io"],"resources":["deployments","jobs","pods","pods/log","events","configmaps","chaosengines","chaosexperiments","chaosresults"],"verbs":["create","list","get","patch","update","delete"]},{"apiGroups":[""],"resources":["nodes"],"verbs":["get","list"]}],"scope":"Namespaced"}}}
  creationTimestamp: "2020-06-05T13:22:17Z"
  generation: 1
  managedFields:
  - apiVersion: litmuschaos.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:description:
        .: {}
        f:message: {}
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:spec:
        .: {}
        f:definition:
          .: {}
          f:args: {}
          f:command: {}
          f:env: {}
          f:image: {}
          f:labels:
            .: {}
            f:name: {}
          f:permissions: {}
          f:scope: {}
    manager: kubectl
    operation: Update
    time: "2020-06-05T13:22:17Z"
  name: pod-delete
  namespace: nginx
  resourceVersion: "3465"
  selfLink: /apis/litmuschaos.io/v1alpha1/namespaces/nginx/chaosexperiments/pod-delete
  uid: 1ea49dc0-2e58-41ff-9953-2e4844702aaa
spec:
  definition:
    args:
    - -c
    - ansible-playbook ./experiments/generic/pod_delete/pod_delete_ansible_logic.yml
      -i /etc/ansible/hosts -vv; exit 0
    command:
    - /bin/bash
    env:
    - name: ANSIBLE_STDOUT_CALLBACK
      value: default
    - name: TOTAL_CHAOS_DURATION
      value: "15"
    - name: RAMP_TIME
      value: ""
    - name: KILL_COUNT
      value: ""
    - name: FORCE
      value: "true"
    - name: CHAOS_INTERVAL
      value: "5"
    - name: LIB
      value: ""
    image: litmuschaos/ansible-runner:1.4.0
    labels:
      name: pod-delete
    permissions:
    - apiGroups:
      - ""
      - apps
      - batch
      - litmuschaos.io
      resources:
      - deployments
      - jobs
      - pods
      - pods/log
      - events
      - configmaps
      - chaosengines
      - chaosexperiments
      - chaosresults
      verbs:
      - create
      - list
      - get
      - patch
      - update
      - delete
    - apiGroups:
      - ""
      resources:
      - nodes
      verbs:
      - get
      - list
    scope: Namespaced
cat <

查看混沌的实验结果

混沌实验是作为Kubernetes作业执行的,受影响的豆荚将由混沌执行者根据实验定义删除。

让我们执行以下步骤来回顾我们的混沌实验的结果:

$ watch -n 1 kubectl get pods -n nginx
Every 1.0s: kubectl get pods -n nginx            192.168.1.102: Sat Jun  6 01:35:22 2020

NAME                                READY   STATUS              RESTARTS   AGE
nginx-chaos-runner                  1/1     Running             0          31s
nginx-deployment-558fc78868-f4tcd   0/1     Terminating         0          3m54s
nginx-deployment-558fc78868-g6wjm   0/1     ContainerCreating   0          1s
nginx-deployment-558fc78868-wbzd2   1/1     Running             0          3m38s
pod-delete-xb472u-rvjc8             1/1     Running             0          24s

❯ kubectl get chaosresults -n nginx
NAME                     AGE
nginx-chaos-pod-delete   11m
❯ kubectl describe chaosresults nginx-chaos-pod-delete -n nginx
Name:         nginx-chaos-pod-delete
Namespace:    nginx
Labels:       chaosUID=7181dd32-dcd2-44c8-b9a1-62f76b4426d4
              type=ChaosResult
Annotations:  API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosResult
Metadata:
  Creation Timestamp:  2020-06-05T17:25:49Z
  Generation:          6
  Managed Fields:
    API Version:  litmuschaos.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:labels:
          .:
          f:chaosUID:
          f:type:
      f:spec:
        .:
        f:engine:
        f:experiment:
      f:status:
        .:
        f:experimentstatus:
          .:
          f:failStep:
          f:phase:
          f:verdict:
    Manager:         kubectl
    Operation:       Update
    Time:            2020-06-05T17:37:39Z
  Resource Version:  8339
  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/nginx/chaosresults/nginx-chaos-pod-delete
  UID:               bffa195d-4bf5-47a3-9a6c-7f2287107ea5
Spec:
  Engine:      nginx-chaos
  Experiment:  pod-delete
Status:
  Experimentstatus:
    Fail Step:  N/A
    Phase:      Completed
    Verdict:    Pass
Events:         

查看混沌的实验事件

可以查看指定命名空间下的来了解和还原我们的混沌实验:

❯ kubectl get events -n nginx --sort-by='{.lastTimestamp}'
LAST SEEN   TYPE      REASON                          OBJECT                                   MESSAGE
13m         Normal    ChaosInject                     chaosengine/nginx-chaos                  Injecting pod-delete chaos on nginx-deployment-558fc78868-s26cl pod
13m         Normal    Scheduled                       pod/nginx-deployment-558fc78868-sgswp    Successfully assigned nginx/nginx-deployment-558fc78868-sgswp to minikube
13m         Normal    Killing                         pod/nginx-deployment-558fc78868-s26cl    Stopping container nginx
13m         Normal    SuccessfulCreate                replicaset/nginx-deployment-558fc78868   (combined from similar events): Created pod: nginx-deployment-558fc78868-sgswp
13m         Normal    Pulled                          pod/nginx-deployment-558fc78868-sgswp    Container image "nginx" already present on machine
13m         Normal    Started                         pod/nginx-deployment-558fc78868-sgswp    Started container nginx
13m         Normal    Created                         pod/nginx-deployment-558fc78868-sgswp    Created container nginx
12m         Normal    PostChaosCheck                  chaosengine/nginx-chaos                  AUT is Running successfully
12m         Normal    Summary                         chaosengine/nginx-chaos                  pod-delete Experiment Passed!
12m         Normal    Completed                       job/pod-delete-xb472u                    Job completed
12m         Normal    ExperimentJobCleanUp            chaosengine/nginx-chaos                  Experiment Job 'pod-delete-xb472u' is deleted
12m         Normal    Killing                         pod/nginx-chaos-runner                   Stopping container chaos-runner
12m         Normal    ChaosEngineCompleted            chaosengine/nginx-chaos                  Chaos Engine completed, will delete or retain the resources according to jobCleanUpPolicy

更多内容

  • Litmus documentation:

  • Chaos Charts for Kubernetes:

  • Chaoskube project:

  • Pumba project: