Part 2

Topics:

POD

Single Container pod
MultiContainer pod
Login to a pod
Copy to a pod and from a pod(container)
How to check logs from a container
Environment Variables of pod
initContainer
Command and Argument of a pod
pullSecret of a pod
pod restart policy
imagePull policy
How to delete a pod
Pod priority
Pod Resources
Pod Quality of Services(QOS)

NameSpace

Create a namespace
Switch from one ns to another
Resource Quota
Resource Limits

Advance Scheduling of pod

Scheduler
Nodename
nodeSelector
Node Affinity
Taints and Tolerations
Pod affinity and anti affinity
Priority and PriorityClass
Preemption
Disruption Budget
Topology and Constraints
Descheduler

POD

A pod is the smallest deployable unit in Kubernetes that represents a single instance of an application.
For example, if you want to run the Nginx application, you run it in a pod.
A container is a single unit. However, a pod can contain more than one container. You can think of pods as a box that can hold one or more containers together.
the pod gets a single unique IP address and containers running inside the pod use localhost to connect to each other on different ports.
Each pod gets a unique IP address.
Pods communicate with each other using the IP address.
Containers inside a pod connect using localhost on different ports.
Containers running inside a pod should have different port numbers to avoid port clashes.
You can set CPU and memory resources for each container running inside the pod.
Containers inside a pod share the same volume mount.
All the containers inside a pod are scheduled on the same node; It cannot span multiple nodes.
- If there is more than one container, during the pod startup all the main containers start in parallel. Whereas the init containers inside the pod run in sequence.

Internal Process of Pod Creation

Submit Pod Specification

User Action: You provide a Pod specification (YAML or JSON) using the kubectl apply -f command or through an API request.
Kubernetes API Server: The specification is sent to the Kubernetes API server, which is the central management component of Kubernetes. API Server Processing
Validation: The API server validates the Pod specification to ensure it conforms to the Kubernetes schema.
Persistence: If valid, the Pod specification is stored in etcd, which is the cluster’s key-value store. This acts as the source of truth for all cluster data. Scheduler
Pod Scheduling: The Kubernetes Scheduler periodically looks for newly created Pods that don’t have a Node assigned and needs scheduling.
Resource Evaluation: The Scheduler evaluates available Nodes based on resource requirements (CPU, memory), constraints, and other scheduling policies.
Binding: Once a suitable Node is found, the Scheduler binds the Pod to that Node by updating the Pod’s status in etcd.

Kubelet

Node-Level Management: Each Node runs a Kubelet, an agent responsible for managing Pods on that Node.
Pod Fetching: The Kubelet periodically polls the API server for updates and finds out that a Pod has been scheduled to its Node.
Container Runtime Interaction: The Kubelet interacts with the container runtime (like Docker or containerd) to pull container images and create containers based on the Pod specification. Container Creation
Image Pulling: The container runtime pulls the required container images from the specified container registry if they are not already cached on the Node.
Container Start: The container runtime creates and starts the containers according to the Pod’s configuration (e.g., environment variables, volume mounts).

Pod Initialization

Lifecycle Hooks: Any defined lifecycle hooks (like initContainers) are executed. Init containers run before the main containers and must complete successfully for the Pod to start.

Readiness and Liveness Probes: The Kubelet performs readiness and liveness checks as defined in the Pod specification to ensure containers are running properly and are ready to accept traffic. Pod Running
Status Update: Once the containers are running, the Kubelet updates the Pod’s status in the API server.
Communication: The Pod is now part of the cluster network, and other Pods or services can communicate with it based on defined network policies and service configurations.

Detailed Lifecycle of a Pod

Pending: When the Pod is first created, it is in the Pending state until the Scheduler assigns it to a Node.
Running: Once the containers are started, the Pod transitions to the Running state. This state means the Pod is actively running on the assigned Node.
Succeeded: If the Pod’s containers complete their tasks and exit successfully (for example, a Job or a single-run task), the Pod moves to the Succeeded state.
Failed: If the Pod’s containers exit with an error or fail to start, it transitions to the Failed state.
Unknown: If the Kubernetes system cannot determine the Pod’s state (for example, due to communication issues with the Node), the Pod status may be marked as Unknown.

Summary

The creation and management of a Pod in Kubernetes involve several key components: the API server, the Scheduler, the Kubelet, and the container runtime. Each plays a role in ensuring that the Pod is properly scheduled, created, and maintained according to the specifications provided.

Purpose of the Pause Container in Kubernetes

Overview

In Kubernetes, a pause container is a special container used primarily for managing the network namespace of a Pod. It serves as a “parent” container, ensuring that the Pod’s network namespace remains active and stable.

Key Purposes

Network Namespace Management
- Network Namespace: Each Pod in Kubernetes is assigned a network namespace, which isolates its network resources.
- Pause Container Role: The pause container holds the network namespace open. Without it, the namespace could be terminated if the main container(s) in the Pod stop running. The pause container ensures that the network namespace persists as long as the Pod is alive.
Pod Lifecycle Stability
- Main Containers: Pods can have one or more main containers. When these containers complete their tasks or exit, the pause container ensures the Pod’s network namespace is not destroyed prematurely.
- Pod Deletion: The Pod itself is only deleted when the pause container is removed. This ensures that network cleanup and other associated resources are handled correctly.
Efficient Resource Management
- Minimal Resource Usage: The pause container does not perform any significant work. It typically runs an idle process that uses minimal resources, which helps in maintaining an efficient resource footprint.
- Pod Reuse: By keeping the network namespace alive, the pause container allows Kubernetes to manage and reassign the network resources efficiently when Pods are scaled or recreated.

Implementation

Container Image: The pause container usually uses a lightweight image that does nothing but keep the namespace active. A common image used is k8s.gcr.io/pause.
Pod Specification: The pause container is automatically added by Kubernetes when a Pod is created. Users typically do not interact with or configure the pause container directly.

Summary

The pause container is a crucial component in Kubernetes that maintains the network namespace of a Pod, ensuring stability and efficient resource management. It supports the lifecycle of Pods by keeping network resources active and ready for use, even if the main containers in the Pod stop running.

Create a pod

kubectl run pod --image=nginx

Check newly created pod

kubectl get pods

Check more info of a pod like where the pod is scheduled and ip address

kubectl get pods -o wide

check pods details like events and resources

kubectl describe pod <podname>

Check the name of all pods

kubectl get pods -o name

Create a pod using yaml file

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: example-pod
  labels:
    app: example
spec:
  containers:
  - name: my-container
    image: nginx:latest
EOF

Check Pod Logs
```
kubectl logs <pod-name>
```
Follow Pod Logs in Real-time

  kubectl logs -f <pod-name>

How to check logs for a container from multiContainer pod

kubectl logs -c <containername> <podname>

Check the logs for all containers

kubectl logs <podname> --all-containers=true

Execute Command in a Pod:

kubectl exec -it <pod-name> -- <command>

Copy Files to/from Pod:

kubectl cp <local-path> <pod-name>:<pod-path>
kubectl cp <pod-name>:<pod-path> <local-path>

Delete a Pod:

kubectl delete pod <pod-name>

How to delete a pod forcefully

kubectl delete pod --force --grace-period=0

Port Forwarding to Pod:

kubectl port-forward <pod-name> <local-port>:<pod-port>

Port forwarding on ip address not on the localhost

 kubectl port-forward --address 0.0.0.0 pod/mypod 8888:5000

Get YAML Definition of a Running Pod:

kubectl get pod <pod-name> -o yaml

Some useful command in realtime(Production)

Find out all the images of all the pods

kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}'

Get All containers name

kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace} {.metadata.name}: {.spec.containers[*].name}{"\n"}{end}'

Define Environment Variables for a Container

A Kubernetes environment variable is a dynamic value that configures some aspect of the environment in which a Kubernetes-based application runs.

env: 
    - name: SERVICE_PORT 
        value: "8080" 
    - name: SERVICE_IP 
        value: "192.168.100.1"

Problem Statement if the pod is failing becuase of environment variable

Deploy one mysql pod and see if the pod is failing

kubectl run mysql --image=mysql:5.6

Check the logs and fix the issue

Environment Variables

Apply required variables using below yaml file

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-env
spec:
  containers:
  - name: my-container
    image: mysql:5.6
    env:
      - name:  MYSQL_ROOT_PASSWORD
        value: "root"
EOF

Command and Arguments with Kubernetes Pod

Here’s a table summarizing the field names used by Docker and Kubernetes:

Description	Docker field name	Kubernetes field name
The command run by the container	Entrypoint	command
The arguments passed to the command	Cmd	args

When you override the default Entrypoint and Cmd, these rules apply:

If you do not supply command or args for a Container, the defaults defined in the Docker image are used.
If you supply a command but no args for a Container, only the supplied command is used. The default EntryPoint and the default Cmd defined in the Docker image are ignored.
If you supply only args for a Container, the default Entrypoint defined in the Docker image is run with the args that you supplied.
If you supply a command and args, the default Entrypoint and the default Cmd defined in the Docker image are ignored. Your command is run with your args.

Example 1: Command Override

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-command
spec:
  containers:
  - name: my-container
    image: nginx:latest
    command: ["echo"]
    args: ["Hello, Kubernetes!"]
EOF

Example 2: Command and Arguments

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-command-args
spec:
  containers:
  - name: my-container
    image: busybox:latest
    command: ["sh", "-c"]
    args: ["echo Hello from Kubernetes! && sleep 3600"]
EOF

Example 3: Passing Environment Variables to Commands


kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-env-vars
spec:
  containers:
  - name: my-container
    image: alpine:latest
    command: ["/bin/sh", "-c"]
    args: ["echo \$GREETING"]
    env:
    - name: GREETING
      value: "Hello, Kubernetes!"
EOF

Example 4: Passing Arguments to Docker Entrypoint

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-entrypoint-args
spec:
  containers:
  - name: my-container
    image: ubuntu:latest
    command: ["/bin/echo"]
    args: ["Hello", "Kubernetes!"]
EOF

MultiContainer pod

Use Cases for Multi-Container Pods

Pods that run multiple containers that need to work together.
A Pod can encapsulate an application composed of multiple co-located containers that are tightly coupled and need to share resources.
These co-located containers form a single cohesive unit.
Here is an example for multiple container.

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx-redis-pod
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    ports:
    - containerPort: 80
  - name: redis-container
    image: redis:latest
    ports:
    - containerPort: 6379
EOF

InitContainer

A Pod can have multiple containers running apps within it, but it can also have one or more init containers, which are run before the app containers are started. Init containers are exactly like regular containers, except:
Init containers always run to completion.
Each init container must complete successfully before the next one starts.
If a Pod’s init container fails, the kubelet repeatedly restarts that init container until it succeeds.
Regular init containers (in other words: excluding sidecar containers) do not support the lifecycle, livenessProbe, readinessProbe, or startupProbe fields.
Init containers must run to completion before the Pod can be ready
If you specify multiple init containers for a Pod, kubelet runs each init container sequentially
Each init container must succeed before the next can run. Here is an example for initContainer:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: init-container-pod
spec:
 containers:
 - name: main-container
   image: nginx:latest
   ports:
   - containerPort: 80
 initContainers:
 - name: init-wait-redis
   image: busybox:latest
   command: ["sh", "-c", "until nc -zv nginx 80; do echo 'Waiting for Redis to be ready'; sleep 1; done"]
EOF

Make a test by creating a pod and service

kubectl run nginx --image=nginx
kubectl expose pod/nginx --port 80

Now check the pod status initContainer should be successful
Also check the logs for initContainer

Sidecar container

A Sidecar container extends and enhances the functionality of a preexisting container without changing it.
This pattern is one of the fundamental container patterns that allows single-purpose containers to cooperate closely together.

Adapter Container

The Adapter pattern takes a heterogeneous containerized system and makes it conform to a consistent, unified interface with a standardized and normalized format that can be consumed by the outside world.
The Adapter pattern inherits all its characteristics from the Sidecar,

Resources definition in pod

Burstable QOS

Kubernetes assigns the Burstable class to a Pod when a container in the pod has more resource limit than the request value.

A pod in this category will have the following characteristics:

The Pod has not met the criteria for Guaranteed QoS class.
A container in the Pod has an unequal memory or CPU request or limit An example is given below

resources:
      limits:
        memory: "300Mi"
        cpu: "800m"
      requests:
        memory: "100Mi"
        cpu: "600m"

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-best-efforts
spec:
  containers:
  - name: my-container
    image: nginx:latest
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
EOF

Check the pods if this is running
check the QOS of the pod

kubectl describe pod | grep -i qos

Guranteed QOS

Kubernetes considers Pods classified as Guaranteed as a top priority. It won’t evict them until they exceed their limits.
A Pod with a Guaranteed class has the following characteristics:
- All containers in the Pod have a memory limit and request.
- All containers in the Pod have a memory limit equal to the memory request.
- All containers in the Pod have a CPU limit and a CPU request.
- All containers in the Pod have a CPU limit equal to the CPU request.

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: guarenteed
spec:
  containers:
  - name: my-container
    image: nginx:latest
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "64Mi"
        cpu: "250m"
EOF

Best Efforts

Kubernetes assigns the Burstable class to a Pod when a container in the pod has no resource section.
A pod in this category will have the following characteristics:
The Pod has not met the criteria for Guaranteed QoS class.
A container in the Pod has an unequal memory or CPU request or limit
Pod without resourcs section are considered best efforts

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: best-efforts
spec:
  containers:
  - name: my-container
    image: nginx:latest
   
EOF

imagepullSecret

Pod that uses a Secret to pull an image from a private container image registry or repository.
There are many private registries in use.
This task uses Docker Hub as an example registry.

Steps to Use a Pull Secret in a Pod YAML:

Create Docker Config JSON File:
- Create a Docker configuration file (~/.docker/config.json) with the credentials for your private registry.
- You can use the docker login command to generate this file.

   docker login

Create a Kubernetes Secret:

kubectl create secret generic my-pull-secret --from-file=.dockerconfigjson=$HOME/.docker/config.json --type=kubernetes.io/dockerconfigjson

use the yaml to create pod with private registry

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-pull-secret
spec:
  containers:
  - name: my-container
    image: myregistry.example.com/my-image:latest
  imagePullSecrets:
  - name: my-pull-secret
EOF

Give permission to serviceAccount to pull image so that you will not have to mention in pod Spec

kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "my-pull-secret"}]}'

Learn about ServiceAccount

imagePullPolicy Examples

1. IfNotPresent

Description: Pull the image only if it does not already exist locally.
Usage: Use this policy when you want to minimize image pulls and rely on locally cached images.

Example:

spec:
  containers:
  - name: my-container
    image: my-image:latest
    imagePullPolicy: IfNotPresent

2. Always

Description : Always pull the latest version of the image, even if it already exists locally.
Usage: Use this policy when you want to ensure that the container runs the latest version of the image.
Example:

spec:
containers:
- name: my-container
  image: my-image:latest
  imagePullPolicy: Always

3. Never

Description: Never pull the image, and only use the locally cached version if available.
Usage : Use this policy when you want to prevent Kubernetes from pulling the image, even if it does not exist locally.
Example:

spec:
containers:
- name: my-container
 image: my-image:latest
 imagePullPolicy: Never

4. IfPresent

Description: Pull the image only if it already exists locally. If the image does not exist locally, do not pull it.
Usage: Use this policy when you want to pull the image only if it is already available locally, otherwise use a cached version.
Example

spec:
  containers:
  - name: my-container
    image: my-image:latest
    imagePullPolicy: IfPresent

5. Default (IfNotPresent)

Description: Kubernetes default behavior if imagePullPolicy is not explicitly set.
Pull the image only if it does not already exist locally.
Usage: This is the default behavior, and it is suitable for many use cases where you want to minimize image pulls. Example :

spec:
containers:
- name: my-container
  image: my-image:latest

Container RestartPolicy in Pod block

1. Always

Description: Always restart the container regardless of the exit status or reason.
Usage: Use this policy for critical services that should always be running.
Example:

  spec:
    restartPolicy: Always

2. OnFailure

Description: Restart the container only if it exits with a non-zero status.
Usage: Use this policy for jobs or batch processes that should be retried on failure.

Example:

spec:
  restartPolicy: OnFailure

3. Never

Description: Never restart the container, regardless of the exit status or reason.
Usage: Use this policy for containers that are expected to run to completion and not be restarted.

Example:

spec:
 restartPolicy: Never

4. Default (Always)

Description: Kubernetes default behavior if restartPolicy is not explicitly set. Always restart the container.
Usage: This is the default behavior and is suitable for many long-running services.

Example

spec:
  restartPolicy: Always (default behavior)

Advance Scheduling of pod

Using nodeName

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-scheduled-node
spec:
  nodeName: <mentionhere nodename>
  containers:
  - name: my-container
    image: nginx:latest
EOF

LAB 1

Pod Management
Create a pod
Login to a pod
Check logs
Create multi container pod
Login to specific conatiner
Check logs for specific container
Describe a pod
Check the events
Set the resources for a container
- requests and limits
Confgure QOS for a pod
- Best efforts
- Burstable
- Guarenteed
Set Environement varible
use pullSecret to download the image

Namespace

Namespaces are logical division of kubernetes cluster.
namespaces provides a mechanism for isolating resources within a single cluster
Names of resources need to be unique within a namespace, but not across namespaces.
To achieve multitenancy with Networkpolicy
To configure RBAC
Check all namespaces

kubectl get ns

Create a new namespace

kubectl create ns <name>

Switch from one ns to another

kubens
kubens <ns name>

To see which Kubernetes resources are and aren’t in a namespace:

In a namespace

kubectl api-resources --namespaced=true

Not in a namespace

kubectl api-resources --namespaced=false

What is a Resource Quota?

A Resource Quota in Kubernetes is a way to limit the amount of resources that can be consumed by a namespace. It helps ensure fair usage of resources among different teams or applications and prevents any single namespace from consuming all available resources in a cluster.

Key Concepts

Namespace: Resource quotas are applied on a per-namespace basis.
Resource Limits: Quotas can specify limits on various resources such as CPU, memory, and storage.
Usage Tracking: Quotas help track and control resource usage within namespaces.

Types of Quotas

Resource Quotas: Limits on CPU, memory, and storage.
Limit Ranges: Define minimum and maximum resource limits for individual pods or containers.

Common Resources Managed by Quotas

CPU: Maximum amount of CPU that can be requested.
Memory: Maximum amount of memory that can be requested.
Storage: Maximum amount of persistent volume storage that can be requested.
Pods: Maximum number of pods that can be created in a namespace.
Services: Maximum number of services that can be created in a namespace.
ConfigMaps: Maximum number of ConfigMaps that can be created.
If resourcequota is defined , Pod can not be created without resource definition. Solution is to create limitrange

Creating a Resource Quota

Learn About Resource Quota and limits
Configure Memory and CPU Quotas for a Namespace

kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-demo
spec:
  hard:
    requests.cpu: "1"
    requests.memory: 1Gi
    limits.cpu: "2"
    limits.memory: 2Gi
EOF

As per above Example:

The ResourceQuota places these requirements on the quota-mem-cpu-example namespace:
For every Pod in the namespace, each container must have a memory request, memory limit, cpu request, and cpu limit.
The memory request total for all Pods in that namespace must not exceed 1 GiB.
The memory limit total for all Pods in that namespace must not exceed 2 GiB.
The CPU request total for all Pods in that namespace must not exceed 1 cpu.
The CPU limit total for all Pods in that namespace must not exceed 2 cpu.
Check applied quota

kubectl get quota

Create namespace using a yaml file

kubectl apply -f - <<EOF
kind: Namespace
apiVersion: v1
metadata:
  name: test
  labels:
    name: test
EOF

Resource Limits

A Kubernetes cluster can be divided into namespaces. Once you have a namespace that has a default memory limit, and you then try to create a Pod with a container that does not specify its own memory limit, then the control plane assigns the default memory limit to that container.

Kubernetes Resource Limits

What are Resource Limits?

Resource Limits in Kubernetes define the maximum amount of resources (CPU and memory) that a container can use. They help manage and constrain resource usage to ensure fair sharing of resources across multiple containers and prevent a single container from consuming excessive resources.

Key Concepts

Requests: The amount of CPU or memory that a container is guaranteed to have.
Limits: The maximum amount of CPU or memory a container can use. If a container tries to use more than its limit, it may be throttled (for CPU) or terminated and potentially restarted (for memory).

Why Set Resource Limits?

Prevent Resource Exhaustion: Ensure that no single container can consume all available resources, affecting other containers.
Improve Stability: Avoid situations where containers cause resource contention and degrade overall cluster performance.
Optimize Resource Usage: Better manage and allocate resources based on the actual needs of applications.

Specifying Resource Limits

You can specify resource limits in the container specification of a Pod’s manifest. Resource requests and limits are defined in the resources field of the container.

Example

Here’s an example of a Pod configuration that sets both resource requests and limits for CPU and memory:

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example-container
    image: nginx
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Create a LimitRange

kubectl apply -f - <<EOF
apiVersion: v1
kind: LimitRange
metadata:
  name: example-limits
spec:
  limits:
  - max:
      cpu: "2"            # Maximum CPU limit for a container (2 cores)
      memory: "1Gi"       # Maximum memory limit for a container (1 GiB)
    min:
      cpu: "200m"         # Minimum CPU request for a container (200 millicores, or 0.2 cores)
      memory: "256Mi"     # Minimum memory request for a container (256 MiB)
    default:
      cpu: "500m"         # Default CPU request for a container (500 millicores, or 0.5 cores)
      memory: "512Mi"     # Default memory request for a container (512 MiB)
    defaultRequest:
      cpu: "300m"         # Default CPU request if not specified (300 millicores, or 0.3 cores)
      memory: "384Mi"     # Default memory request if not specified (384 MiB)
    type: Container
EOF

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "250m"
      memory: "128Mi"
    type: Containe

Now create one pod with no resource and check if the default resource has been applied

kubectl describe pod <podname>

LAB 2

Create a namespace
Create a resource quota
Check if you can create a pod
Create a resource limits
Create a pod now
Check if the default cpu and memory has been taken

Pod Advance Scheduling

use nodeName

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  nodeName: kmaster
  containers:
  - name: example-container
    image: nginx

Steps to Use nodeSelector:

Label the node

kubectl label nodes specific-node-name diskType=ssd

Use this label to schedule the pod

apiVersion: v1
kind: Pod
metadata:
  name: pod-with-node-selector
spec:
  containers:
  - name: my-container
    image: nginx:latest
  nodeSelector:
    diskType: ssd

Check where this pod has been schduled
```
kubectl get pods -o wide
```

Internal Process of the Kubernetes Scheduler

The Kubernetes Scheduler is a core component responsible for assigning Pods to Nodes in a Kubernetes cluster. It ensures that Pods are placed on Nodes based on resource requirements and constraints. Here’s an overview of its internal process:

1. Pod Submission and API Server

Pod Creation: When a new Pod is created, its specification is submitted to the Kubernetes API server.
Storage: The API server stores the Pod’s configuration in etcd, the cluster’s key-value store.

2. Scheduler Activation

Scheduling Loop: The Scheduler continuously watches for new or unscheduled Pods by querying the API server for Pods that do not have a Node assigned.
Notification: When it detects an unscheduled Pod, it begins the scheduling process.

3. Filtering and Scoring

Filtering: The Scheduler filters out Nodes that are not suitable for the Pod based on several criteria:
- Resource Requests: Checks if the Node has enough CPU, memory, and other resources to meet the Pod’s requests.
- Node Affinity: Considers Node affinity rules specified in the Pod’s configuration.
- Taints and Tolerations: Ensures that the Pod can tolerate any taints present on the Node.
- Pod Affinity/Anti-Affinity: Evaluates if the Pod should or should not be placed near other Pods based on affinity rules.
Scoring: After filtering, the Scheduler scores the remaining Nodes to determine which is the best fit for the Pod. Scoring is based on various factors such as:
- Resource Utilization: Prefers Nodes with more available resources or balanced resource usage.
- Inter-Pod Affinity/Anti-Affinity: Considers how well the Node meets the Pod’s affinity/anti-affinity rules.
- Custom Scoring: Some scheduling plugins can apply additional scoring rules.

4. Binding

Select Node: The Scheduler selects the best Node based on the highest score.
Update API Server: It updates the Pod’s status in the API server with the chosen Node’s name.
Binding Object: The Scheduler creates a binding object in etcd to record the Node assignment.

5. Kubelet Notification

Node Update: The Kubelet on the selected Node detects the updated Pod specification (through periodic polling or a watch mechanism).
Container Creation: The Kubelet pulls the necessary container images and creates containers based on the Pod’s specification.

6. Pod Initialization

Lifecycle Hooks: Any defined lifecycle hooks, such as initContainers, are executed.
Health Checks: The Kubelet performs readiness and liveness checks to ensure that the Pod and its containers are running properly.

7. Pod Status Update

API Server Update: The Kubelet updates the Pod status in the API server, reflecting its current state (e.g., Running, Pending, Failed).

8. Monitoring and Re-scheduling

Continuous Monitoring: The Scheduler continues to monitor the cluster for changes that might require re-scheduling, such as node failures or resource constraints.
Re-scheduling: If necessary, the Scheduler can reassign Pods to different Nodes to maintain optimal resource utilization and availability.

Summary

The Kubernetes Scheduler plays a vital role in managing Pod placement within a cluster. It:

Watches for unscheduled Pods.
Filters Nodes based on the Pod’s requirements and constraints.
Scores and selects the most suitable Node.
Binds the Pod to the selected Node by updating the Pod’s status in the API server.
Informs the Kubelet on the chosen Node to create and run the containers.

This process ensures efficient distribution of Pods across the cluster, meeting both resource and policy requirements.

Taint and Toleration

A taint allows a node to refuse pod to be scheduled unless that pod has a matching toleration.
You apply taints to a node through the node specification (NodeSpec) and apply tolerations to a pod through the pod specification (PodSpec). A taint on a node instructs the node to repel all pods that do not tolerate the taint.
Taints and tolerations consist of a key, value, and effect. An operator allows you to leave one of these parameters empty.

Taint and Toleration key points

Parameter	Description
key	Any string, up to 253 characters. Must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
value	Any string, up to 63 characters. Must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
effect	One of: NoSchedule, PreferNoSchedule, NoExecute
operator	One of: Equal, Exists

How to apply taint on a node

kubectl taint nodes <node-name> <key>=<value>:<effect>

kubectl taint nodes specific-node-name disktype=ssd:NoSchedule

Now use this in pod yaml

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: toleration-pod
spec:
  containers:
  - name: my-container
    image: nginx:latest
  tolerations:
  - key: disktype
    operator: Equal
    value: ssd
    effect: NoSchedule
EOF

Example : NotEqual Operator

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: toleration-not-equal-pod
spec:
  containers:
  - name: my-container
    image: nginx:latest
  tolerations:
  - key: disktype
    operator: NotEqual
    value: ssd
    effect: NoSchedule
EOF

Example Exists Operator with Key

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: toleration-exists-key-pod
spec:
  containers:
  - name: my-container
    image: nginx:latest
  tolerations:
  - key: disktype
    operator: Exists
    effect: NoSchedule
EOF

Check the pods where are they schedules

kubectl get pod -o wide

LAB 3

Create a pod to schedule on a specific node
Set the node some label
Test to use nodeName

Node Affinity

Label Nodes

kubectl label nodes kworker1 example-label=value1
kubectl label nodes kworker2 example-label=value2

Yaml for pod creation

cat <<EOF > pod-with-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-node-affinity
spec:
  containers:
  - name: nginx-container
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: example-label
            operator: In
            values:
            - value1
            - value2
EOF

Example 2 with operator notin

cat <<EOF > pod-with-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-node-affinity
spec:
  containers:
  - name: nginx-container
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: example-label
            operator: NotIn
            values:
            - value3
            - value4
EOF

Example Using Exists Operator

cat <<EOF > pod-with-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-node-affinity
spec:
  containers:
  - name: nginx-container
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: example-label
            operator: Exists
EOF

Check the pods status

Example with Preferred Affinity

cat <<EOF > pod-with-preferred-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-preferred-node-affinity
spec:
  containers:
  - name: nginx-container
    image: nginx
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: example-label
            operator: Exists
EOF

Pod Affinity and Anti Affinity

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-pod-affinity
  labels:
    app: web
spec:
  containers:
  - name: nginx-container
    image: nginx
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-pod-affinity-rule
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - web
        topologyKey: kubernetes.io/hostname
  containers:
  - name: nginx-container
    image: nginx
EOF

Another Example with weight

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-preferred-pod-affinity
  labels:
    app: database
spec:
  containers:
  - name: postgres-container
    image: postgres
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-preferred-pod-affinity-rule
spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - database
          topologyKey: kubernetes.io/hostname
  containers:
  - name: nginx-container
    image: nginx
EOF

Pod antiAffinity

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-anti-affinity
  labels:
    app: web
spec:
  containers:
  - name: nginx-container
    image: nginx
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-anti-affinity-rule
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - web
        topologyKey: kubernetes.io/hostname
  containers:
  - name: nginx-container
    image: nginx
EOF

Soft AntiAffinity

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-preferred-anti-affinity
  labels:
    app: database
spec:
  containers:
  - name: postgres-container
    image: postgres
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-preferred-anti-affinity-rule
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - database
          topologyKey: kubernetes.io/hostname
  containers:
  - name: nginx-container
    image: nginx
EOF

Pod Priority

Pod priority classes

You can assign pods a priority class, which is a non-namespaced object that defines a mapping from a name to the integer value of the priority. The higher the value, the higher the priority.
A priority class object can take any 32-bit integer value smaller than or equal to 1000000000 (one billion). Reserve numbers larger than one billion for critical pods that should not be preempted or evicted. By default, OpenShift Container Platform has two reserved priority classes for critical system pods to have guaranteed scheduling.
System-node-critical - This priority class has a value of 2000001000 and is used for all pods that should never be evicted from a node. Examples of pods that have this priority class are sdn-ovs, sdn, and so forth.
System-cluster-critical - This priority class has a value of 2000000000 (two billion) and is used with pods that are important for the cluster. Pods with this priority class can be evicted from a node in certain circumstances. For example, pods configured with the system-node-critical priority class can take priority. However, this priority class does ensure guaranteed scheduling. Examples of pods that can have this priority class are fluentd, add-on components like descheduler, and so forth.
Sample priority class object

kubectl apply -f - <<EOF
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority 
value: 1000000 
globalDefault: false 
description: "This priority class should be used for XYZ service pods only." 
EOF

Sample pod specification with priority class name

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  priorityClassName: high-priority
EOF

Pod Disruptions Budget:

Learn more about Disruptions Budget

kubectl apply -f - <<EOF
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: example-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: example-app
EOF

Use of pdb with Deployment

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: nginx-container
        image: nginx:latest
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: example-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: example-app
EOF

Pod Preemption and Pod Distruption Budget

Pod preemption and other scheduler settings If you enable pod priority and preemption, consider your other scheduler settings:

Pod priority and pod disruption budget

A pod disruption budget specifies the minimum number or percentage of replicas that must be up at a time. If you specify pod disruption budgets, OpenShift Container Platform respects them when preempting pods at a best effort level. The scheduler attempts to preempt pods without violating the pod disruption budget. If no such pods are found, lower-priority pods might be preempted despite their pod disruption budget requirements.

Pod priority and pod affinity

Pod affinity requires a new pod to be scheduled on the same node as other pods with the same label.

Pointer Regarding Pod Scheduling

Preemption removes existing Pods from a cluster under resource pressure to make room for higher priority pending Pods
The default priority for all pods is zero ( 0 )
Supported Operators for Affinity
- The operator represents the relationship between the label on the node and the set of values in the matchExpression parameters in the pod specification.
- This value can be below:
  - In,
  - NotIn,
  - Exists
  - DoesNotExist
  - Lt
  - Gt
Specify a weight for the node, 1-100. The node that with highest weight is preferred.
A taint on a node instructs the node to repel all pods that do not tolerate the taint.
Taint Effects are given below:

The effect is one of the following:

NoSchedule

New pods that do not match the taint are not scheduled onto that node. Existing pods on the node remain.

PreferNoSchedule

New pods that do not match the taint might be scheduled onto that node, but the scheduler tries not to.

Existing pods on the node remain.

NoExecute

New pods that do not match the taint cannot be scheduled onto that node.

Existing pods on the node that do not have a matching toleration are removed.

operator

Equal

The key/value/effect parameters must match. This is the default.

Exists

The key/effect parameters must match. You must leave a blank value parameter, which matches any.

Descheduler

LAB 4

Create a pod with node affinity
create a pod with pod affinity and pod anti affinity
create a pod with highest priority
Also all the pod in your cluster with highest priority