kubernetes笔记

Posted by Zeusro on November 20, 2018

一些实用工具

  1. kompose

可用于转化docker-compose文件,对于初学kubernetes的人很有帮助

安装类工具

  1. kubeadm

参考:

  1. 证书轮换

进阶调度

每一种亲和度都有2种语境:preferred,required.preferred表示倾向性,required则是强制.

使用亲和度确保节点在目标节点上运行

1
2
3
4
5
6
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: elasticsearch-test-ready
                operator: Exists

参考链接:

  1. advanced-scheduling-in-kubernetes
  2. kubernetes-scheulder-affinity

使用反亲和度确保每个节点只跑同一个应用

1
2
3
4
5
6
7
8
9
10
11
12
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: 'app'
                operator: In
                values:
                - nginx-test2
            topologyKey: "kubernetes.io/hostname"
            namespaces:
            - test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              topologyKey: "kubernetes.io/hostname"
              namespaces:
              - test
              labelSelector:
                matchExpressions:
                - key: 'app'
                  operator: In
                  values:
                   - "nginx-test2"

tolerations 和 taint

tolerations 和 taint 总是结对存在, taint 就像是”虽然我刁莽,抽烟,月光,但我还是一个好女人”,这种污点(taint)一般会让一般男性(pod)敬而远之,但总有几个老实人能够容忍(tolerations).

taint

1
2
kubectl taint nodes xx  elasticsearch-test-ready=true:NoSchedule
kubectl taint nodes xx  elasticsearch-test-ready:NoSchedule-

master节点本身就自带taint,所以才会导致我们发布的容器不会在master节点上面跑.但是如果自定义taint的话就要注意了!所有DaemonSet和kube-system,都需要带上相应的tolerations.不然该节点会驱逐所有不带这个tolerations的容器,甚至包括网络插件,kube-proxy,后果相当严重,请注意

tainttolerations是结对对应存在的,操作符也不能乱用

tolerations

NoExecute
1
2
3
4
5
      tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoExecute

NoExecute是立刻驱逐不满足容忍条件的pod,该操作非常凶险,请务必先行确认系统组件有对应配置tolerations.

特别注意用Exists这个操作符是无效的,必须用Equal

NoSchedule
1
2
3
4
5
6
7
8
      tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoSchedule

是尽量不往这上面调度,但实际上还是会有pod在那上面跑

ExistsExists随意使用,不是很影响

值得一提的是,同一个key可以同时存在多个effect

1
2
Taints:             elasticsearch-exclusive=true:NoExecute
                    elasticsearch-exclusive=true:NoSchedule

其他参考链接:

  1. Kubernetes中的Taint和Toleration(污点和容忍)
  2. kubernetes的调度机制

容器编排的技巧

wait-for-it

k8s目前没有没有类似docker-compose的depends_on依赖启动机制,建议使用wait-for-it重写镜像的command.

在cmd中使用双引号的办法

1
2
3
4
5
6
7
               - "/bin/sh"
               - "-ec"
               - |
                  curl -X POST --connect-timeout 5 -H 'Content-Type: application/json' \
                  elasticsearch-logs:9200/logs,tracing,tracing-test/_delete_by_query?conflicts=proceed  \
                  -d '{"query":{"range":{"@timestamp":{"lt":"now-90d","format": "epoch_millis"}}}}'

k8s的 master-cluster 架构

master(CONTROL PLANE)

  • etcd distributed persistent storage

    Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.

  • kube-apiserver

    front-end for the Kubernetes control plane.

  • kube-scheduler

    Component on the master that watches newly created pods that have no node assigned, and selects a node for them to run on.

  • Controller Manager
    • Node Controller

      Responsible for noticing and responding when nodes go down.

    • Replication Controller

      Responsible for maintaining the correct number of pods for every replication controller object in the system.

    • Endpoints Controller

      Populates the Endpoints object (that is, joins Services & Pods).

    • Service Account & Token Controllers

      Create default accounts and API access tokens for new namespaces.

  • cloud-controller-manager(alpha feature)
    • Node Controller

      For checking the cloud provider to determine if a node has been deleted in the cloud after it stops responding

    • Route Controller

      For setting up routes in the underlying cloud infrastructure

    • Service Controller

      For creating, updating and deleting cloud provider load balancers

    • Volume Controller

      For creating, attaching, and mounting volumes, and interacting with the cloud provider to orchestrate volumes

参考链接:

  1. Kubernetes核心原理(二)之Controller Manager
  2. Kubernetes组件

worker nodes

  • Kubelet

    The kubelet is the primary “node agent” that runs on each node.

  • Kubernetes Proxy

    kube-proxy enables the Kubernetes service abstraction by maintaining network rules on the host and performing connection forwarding.

  • Container Runtime (Docker, rkt, or others)

    The container runtime is the software that is responsible for running containers. Kubernetes supports several runtimes: Docker, rkt, runc and any OCI runtime-spec implementation.

kubernetes的资源

  • spec

The spec, which you must provide, describes your desired state for the object–the characteristics that you want the object to have.

  • status

The status describes the actual state of the object, and is supplied and updated by the Kubernetes system.

image

pod

1
2
3
A pod is a group of one or more tightly related containers that will always run together on the same worker node and in the same Linux namespace(s).

Each pod is like a separate logical machine with its own IP, hostname, processes, etc., running a single application.
  • liveness

The kubelet uses liveness probes to know when to restart a Container.

  • readiness

The kubelet uses readiness probes to know when a Container is ready to start accepting traffic.

  • 问题:如果删除一个pod 是先从endpoint里移除pod ip,还是 pod 先删除

个人见解:

删除一个pod的k8s内部流程

  1. 用户删除pod
  2. apiserver标记pod为’dead’状态
  3. kubelet删除pod 默认等待30s还在运行时 会强制关闭pod 3.1 kubelet等待pod中容器的 prestop 执行结束 3.2 发送 sigterm 信号 让容器关闭 3.3 超过30s等待时间 发送 sigkill 信号强制pod关闭
  4. nodecontroller中的endpoint controller从endpoint中删除此pod

3 4 步骤同时进行 一般情况下4肯定会先于3完成,由于 3 4 顺序不定 极端情况下可能存在 kubelet已经删除了pod,而endpoint controller仍然存在此pod,会导致svc请求会转发到已经删除的pod上,从而导致调用svc出错

参考链接 https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

参考链接:

  1. 容器中使用pod的数据
  2. 在Kubernetes Pod中使用Service Account访问API Server
  3. 优雅停止pod

Deployment

1
A Deployment controller provides declarative updates for Pods and ReplicaSets.
  • Rolling Update
1
2
    #只适用于pod 里面只包含一个 container 的情况
    kubectl rolling-update NAME [NEW_NAME] --image=IMAGE:TAG

Init Containers 用来作初始化环境的容器

参考:

  1. Assign CPU Resources to Containers and Pods
  2. Kubernetes deployment strategies
  3. Autoscaling based on CPU/Memory in Kubernetes — Part II
  4. Assigning Pods to Nodes
  • 资源不够时deployment无法更新

0/6 nodes are available: 3 Insufficient memory, 3 node(s) had taints that the pod didn’t tolerate.

Replication Controller

1
2
3
A replication controller is a Kubernetes resource that ensures a pod is always up and running.

-> label

ReplicaSet(副本集)

1
Replication Controller(副本控制器)的替代产物
k8s组件 pod selector
Replication Controller label
ReplicaSet label ,pods that include a certain label key

参考链接:

  1. 聊聊你可能误解的Kubernetes Deployment滚动更新机制

DaemonSet(守护进程集)

1
A DaemonSet makes sure it creates as many pods as there are nodes and deploys each one on its own node
  • 健康检查
    1. liveness probe
    2. HTTP-based liveness probe

StatefulSet(有状态副本集)

1
Manages the deployment and scaling of a set of Pods , and provides guarantees about the ordering and uniqueness of these Pods.

参考:

  1. StatefulSet

volumes

volumes有2种模式

In-tree是 Kubernetes 标准版的一部分,已经写入 Kubernetes 代码中。 Out-of-tree 是通过 Flexvolume 接口实现的,Flexvolume 可以使得用户在 Kubernetes 内自己编写驱动或添加自有数据卷的支持。

  1. emptyDir – a simple empty directory used for storing transient data,
  2. hostPath – for mounting directories from the worker node’s filesystem into the pod,
  3. gitRepo – a volume initialized by checking out the contents of a Git repository,
  4. nfs – an NFS share mounted into the pod,
  5. gcePersistentDisk (Google Compute Engine Persistent Disk), awsElasticBlockStore (Amazon Web Services Elastic Block Store Volume), azureDisk (Microsoft Azure Disk Volume) – for mounting cloud provider specific storage,
  6. cinder, cephfs, iscsi, flocker, glusterfs, quobyte, rbd, flexVolume, vsphereVolume, photonPersistentDisk, scaleIO – for mounting other types of network storage,
  7. configMap, secret, downwardAPI – special types of volumes used to expose certain Kubernetes resources and cluster info to the pod,
  8. persistentVolumeClaim – a way to use a pre- or dynamically provisioned persistent storage (we’ll talk about them in the last section of this chapter).
  • Persistent Volume 持久卷,就是将数据存储放到对应的外部可靠存储中,然后提供给Pod/容器使用,而无需先将外部存储挂在到主机上再提供给容器。它最大的特点是其生命周期与Pod不关联,在Pod死掉的时候它依然存在,在Pod恢复的时候自动恢复关联。

  • Persistent Volume Claim 用来申明它将从PV或者Storage Class资源里获取某个存储大小的空间。

参考:

  1. Kubernetes中的Volume介绍

ConfigMap

ConfigMap是用来存储配置文件的kubernetes资源对象,所有的配置内容都存储在etcd中.

实践证明修改 ConfigMap 无法更新容器中已注入的环境变量信息。

参考:

  1. Kubernetes ConfigMap热更新测试

service

A Kubernetes service is a resource you create to get a single, constant point of entry to a group of pods providing the same service.

Each service has an IP address and port that never change while the service exists.

The resources will be created in the order they appear in the file. Therefore, it’s best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the controller(s), such as Deployment.

  • ClusterIP

集群内部访问用,外部可直接访问

当type不指定时,创建的就是这一类型的服务

clusterIP: None是一种特殊的headless-service,特点是没有clusterIP

  • NodePort

每个节点都会开相同的端口,所以叫NodePort.有数量限制.外部可直接访问

  • LoadBalancer

特定云产商的服务.如果是阿里云,就是在NodePort的基础上,帮你自动绑定负载均衡的后端服务器而已

  • ExternalName

参考:

  1. IPVS-Based In-Cluster Load Balancing Deep Dive

Horizontal Pod Autoscaler

1
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).

配合metrics APIs以及resource 里面的 request 资源进行调整.

Kubernetes Downward API

1
It allows us to pass metadata about the pod and its environment through environment variables or files (in a so- called downwardAPI volume)
  • environment variables
  • downwardAPI volume

Resource Quotas

基于namespace限制pod资源的一种手段

网络模型

Kubernetes网络模型原理

参考命令:

  1. kubectl命令指南
  2. Kubernetes与Docker基本概念与常用命令对照
  3. kubectl Cheat Sheet
  4. K8S资源配置指南
  5. Introducing Container Runtime Interface (CRI) in Kubernetes

参考电子书: Kubernetes Handbook——Kubernetes中文指南/云原生应用架构实践手册