NodeSelector非常简单,就是将pod调度到我们指定的Node节点上,这里分为两个步骤:
(1) 对Node节点打上特定的label
(2) 创建pod时指定此label.
下面是一个简单示例:
1、对node添加标签,并验证:
kubectl label nodes <node-name> <label-key>=<label-value>
eg:
[root@node-1 ~]# kubectl get node
NAME       STATUS    ROLES     AGE       VERSION
10.0.0.2   Ready     <none>    9d        v1.10.4
10.0.0.3   Ready     <none>    9d        v1.10.4
[root@node-1 ~]# kubectl label nodes 10.0.0.2 disk=ssd
node "10.0.0.2" labeled
[root@node-1 ~]# kubectl get nodes --show-labels
NAME       STATUS    ROLES     AGE       VERSION   LABELS
10.0.0.2   Ready     <none>    9d        v1.10.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/hostname=10.0.0.2
10.0.0.3   Ready     <none>    9d        v1.10.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.0.0.3
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent   # 节点上没有nginx镜像时才执行pull操作
  nodeSelector:
    disk: ssd提示:如果指定的label在所有node上都无法匹配,则创建Pod失败,会提示无法调度:
Warning FailedScheduling 7s (x6 over 22s) default-scheduler 0/2 nodes are available: 2 node(s) didn‘t match node selector.
调度成功后会显示:
# kubectl describe pod nginx
...
Node-Selectors:  disk=ssd
Tolerations:     <none>
Events:
  Type    Reason                 Age   From               Message
  ----    ------                 ----  ----               -------
  Normal  Scheduled              4m    default-scheduler  Successfully assigned nginx to 10.0.0.2
  Normal  SuccessfulMountVolume  4m    kubelet, 10.0.0.2  MountVolume.SetUp succeeded for volume "default-token-hmvnc"
  Normal  Pulling                4m    kubelet, 10.0.0.2  pulling image "nginx"
  Normal  Pulled                 3m    kubelet, 10.0.0.2  Successfully pulled image "nginx"
  Normal  Created                3m    kubelet, 10.0.0.2  Created container
  Normal  Started                3m    kubelet, 10.0.0.2  Started container
默认情况下,kubernetes也自带了一些标签:
# kubectl get nodes --show-labels
NAME       STATUS    ROLES     AGE       VERSION   LABELS
10.0.0.2   Ready     <none>    9d        v1.10.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/hostname=10.0.0.2
10.0.0.3   Ready     <none>    9d        v1.10.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.0.0.3
Node亲和性调度是用于替换NodeSelector的全新调度策略,有两种节点亲和性表示:
这里使用如下示例:
apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0此节亲和性规则指出,只能将pod放置在具有标签的节点上,标签的关键字为kubernetes.io/e2e-az-name,其值必须为e2e-az1或e2e-az2。 另外,在满足该标准的节点中,应该优选具有其标签为another-node-label-key其值是another-node-label-value的节点。
上面的示例使用了in 操作符,NodeAffinity语法支持的操作符包括in, NotIn, Exists, DoesNotExit, Gt, Lt。使用NotIn和DoesNotExist就可以实现排斥功能了。
Pod亲和性的调度策略将节点上的Pod也纳入了考虑范围,这种规则可以描述为:
如果在具有标签X的Node节点上运行了一个或者多个符合条件Y的Pod,那么Pod应该(拒绝/允许)运行在这个Node上。
这里的X指的时集群中的节点,机架,区域等概念,通过内置的节点标签topologyKey来实现。
条件Y表达的是pod对应的一个或者全部命名空间中的一个Label Selector
Pod的亲和与互斥的条件设置也是requiredDuringSchedulingIgnoredDuringExecution 和 preferredDuringSchedulingIgnoredDuringExecution
如下示例,podAffinity 和 podAntiAffinity都被定义在affinity区域中:
apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: kubernetes.io/hostname
  containers:
  - name: with-pod-affinity
    image: k8s.gcr.io/pause:2.0这个示例表示了pod在调度时,需要满足以下几个要求:
最好不要调度到运行label key 为 ‘security’,值为‘S1’ pod的节点上。
pod affinity和 anti-affinity的逻辑操作运算有: In, NotIn, Exists, DoseNotExit.
原则上,topologyKey可以使用任何合法的标签Key赋值,但是出于性能和安全方面的考虑,对topologyKey有如下限制:
附加说明:
Taint前面的节点亲和性作用相反,使用Taints规则,将拒绝Pod在Node上运行。
Taint需要和Tolerations配合使用,让Pod避开那些不合适的Node。在Node上设置一个或多个Taint之后,除非Pod明确声明能够容忍这些“污点”,否则无法在这些Node上运行。Toleration是Pod的属性,让Pod能够运行在标注了Taint的Node节点上。
1、使用kubectl taint 命令为Node设置Taint信息:
kubectl  taint nodes 10.0.0.3  key=value:NoSchedule为node节点的10.0.0.3加上一个Taint,Taint的键为Key,值为value, Taint的效果是NoSchedule。这里表示的含义是任何Pod都不能调度到这个节点,除非设置了toleration.
删除节点上的taint:
kubectl taint nodes 10.0.0.3 key:NoSchedule-2、在Pod中定义容忍(Tolerate)具有该Taint的Node,使得Pod可以被调度到这个节点:
tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"
  或者
tolerations:
- key: "key"
  operator: "Exists"
  effect: "NoSchedule"
Pod的Toleration声明中的Key和effect需要与Taint的设置保持一致,并且满足以下条件之一:
如果不指定operator,则默认值是Equal,还有如下两个特例:
effect取值可以是NoSchedule,还可以取值为PreferNoSchedule,这个值的意思是优先,也可以算作NoSchedule的软限制版本。
系统允许在同一个Node上设置多个Taint,也可以在Pod上设置多个Toleration,匹配上的就会按照规则执行,而剩下的会对Pod产生效果:
tolerationSeconds ,当taint添加后,还可以继续运行多长时间,单位为秒:tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000如果在宽限时间(这里示例是6000s)内Taint被移除,则不会触发驱逐事件。
使用污点和容忍策略一般可以运用于以下场景:
DaemonSet用于管理集群中每个节点仅运行一份Pod的副本实例。这种用法一般适合如下应用:
如下示例,会在每个节点创建一个DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd-elasticsearch
        image: k8s.gcr.io/fluentd-elasticsearch:1.20
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers由于上面的示例中,使用的kube-system的namespace示例,在查询daemonset的时候需要指定namespace
DaemonSet的Pod调度策略与RC类似,除了使用系统内置的算法在每台Node进行调度,也可以使用NodeSelector或NodeAffinity来指定满足条件的Node范围进行调度。
批处理任务通常并行或串行启动多个计算进程去处理一批工作项,处理完成之后,整个批处理任务结束。批处理任务分为如下几种类型:
1、Non-parallel Jobs
2、Parallel Jobs with a fixed completion count
3、并行Job需要一个独立的Queue,Work item都在一个Queue中存放,不能设置Job的.spec.completions参数,此时Job有以下特性:
Kubernetes的定时任务和Linux Cron定时任务语法类似,这里有如下示例:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure
可以发现每隔一分钟执行了一次:
# kubectl get cronjob hello -o wide
NAME      SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE       CONTAINERS   IMAGES    SELECTOR
hello     */1 * * * *   False     0         33s             7m        hello        busybox   <none>
# kubectl get jobs --watch
NAME               DESIRED   SUCCESSFUL   AGE
hello-1529412060   1         1            3m
hello-1529412120   1         1            2m
hello-1529412180   1         1            1m
hello-1529412240   1         0            6s
查看日志信息:
# kubectl get pods --selector=job-name=hello-1529412600
NAME                     READY     STATUS      RESTARTS   AGE
hello-1529412600-slcps   0/1       Completed   0          20s
# kubectl  get pod hello-1529412600-slcps
NAME                     READY     STATUS      RESTARTS   AGE
hello-1529412600-slcps   0/1       Completed   0          51s
# kubectl  logs hello-1529412600-slcps
Tue Jun 19 12:50:17 UTC 2018
Hello from the Kubernetes cluster
删除cronjob:
# kubectl  delete cronjob hello除了上面所述的相关调度策略外,kubernetes还支持自定义调度器,可以使用任何语言开发我们需要的调度规则,将自定义的调度器使用kubectl proxy来运行,指定的文件格式如下:
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
  spec:
    schedulerName: my-scheduler
    containers:
    - name: nginx
      image: nginx原文:http://blog.51cto.com/tryingstuff/2130716