Part 5

Topics

Rancher
Network Policies
Metric Server
Configure HPA
Logging with Grafana Loki
Monitoring with Prometheus
Enable Audit logs

Rancher (Kubernetes Management Tool)

Rancher is a Kubernetes management tool to deploy and run clusters anywhere and on any provider.
Rancher can provision Kubernetes from a hosted provider, provision compute nodes and then install Kubernetes onto them, or import existing Kubernetes clusters running anywhere.
Rancher adds significant value on top of Kubernetes, first by centralizing authentication and role-based access control (RBAC) for all of the clusters, giving global admins the ability to control cluster access from one location.
It then enables detailed monitoring and alerting for clusters and their resources, ships logs to external providers, and integrates directly with Helm via the Application Catalog. If you have an external CI/CD system, you can plug it into Rancher, but if you don’t, Rancher even includes Fleet to help you automatically deploy and upgrade workloads.
Rancher is a complete container management platform for Kubernetes, giving you the tools to successfully run Kubernetes anywhere.
Rancher Installation on docker

sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher

LAB 1

Create a vm , install docker and Rancher to the newly created vm.
Install docker
Login to rancher
Wait for atleast 4 to 5 mins
Monitoring all node pods
Try to deploy some pod/app
Perform other acrions as well.
- Running command from Rancher itself
- Dowload the kubeconfig file form rancher
- RBAC options
- Install app from rancher like (Longhorn,neuvector)

Get started with Kubernetes network policy

Kubernetes network policy lets administrators and developers enforce which network traffic is allowed using rules.
Kubernetes network policy lets developers secure access to and from their applications using the same simple language they use to deploy them. Developers can focus on their applications without understanding low-level networking concepts. Enabling developers to easily secure their applications using network policies supports a shift left DevOps environment.
The Kubernetes Network Policy API supports the following features:
- Policies are namespace scoped
- Policies are applied to pods using label selectors
- Policy rules can specify the traffic that is allowed to/from pods, namespaces, or CIDRs
- Policy rules can specify protocols (TCP, UDP, SCTP), named ports or port numbers

Ingress and egress

ingress is incoming traffic to the pod, and egress is outgoing traffic from the pod. In Kubernetes network policy,
you create ingress and egress “allow” rules independently (egress, ingress, or both).

Default deny/allow behavior

Default allow means all traffic is allowed by default, unless otherwise specified.
Default deny means all traffic is denied by default, unless explicitly allowed.

Network Policy

Create base setup (Default behaviour is all traffic is allowd)
Create a namespace

kubectl create ns external

Test the default behaviour by creating pod and try to ping

kubectl run pod-1 --image=praqma/network-multitool
kubectl run pod-2 --image=praqma/network-multitool
kubectl run pod-3 --image=praqma/network-multitool -n external

Get Ip address

kubectl get pods -o wide
kubectl get pods -o wide -n external

Check connectivity from POD-1 to POD2 and POD-3 and Internet

kubectl exec -it pod-1 -- ping [pod-2-ip]
kubectl exec -it pod-1 -- ping [pod-3-ip]
kubectl exec -it pod-1 -- ping google.com

Rule 1: Deny all Ingress traffic

kubectl apply -f - <<EOF
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}
  policyTypes:
  - Ingress
EOF

Check newly applied network policy

kubectl get netpol

Test incoming traffic should not be allowed

kubectl exec -it pod-1 -- ping [pod2-ip]
kubectl exec -it pod-1 -- ping [pod3-ip]
kubectl exec -it pod-1 -- ping google.com
kubectl exec -it pod-3 -- ping [pod1-ip]
kubectl exec -it pod-3 -- ping [pod2-ip]

Rule 2: Allow Ingress

kubectl apply -f - <<EOF
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-ingress
spec:
  podSelector: {}
  ingress:
  - {}
  policyTypes:
  - Ingress
EOF

Test again

kubectl exec -it pod-3 -- ping [pod1-ip]
kubectl exec -it pod-3 -- ping [pod2-ip]

Rule 3: Deny All Egress:

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
spec:
  podSelector: {}
  policyTypes:
  - Egress
EOF

Make a test

kubectl exec -it pod-2 -- ping google.com
kubectl exec -n external -it pod-3 -- ping [pod1]
kubectl exec -n external -it pod-3 -- ping [pod2]

Rule 4 - PodSelector:

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: podselector-suspicous
spec:
  podSelector:
    matchLabels:
      role: suspicious
  policyTypes:
  - Ingress
  - Egress
EOF

Rule 5 - Ingress From:

kubectl label pod pod-1 role=secure

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ingress-from-ips
spec:
  podSelector:
    matchLabels:
      role: secure
  ingress:
  - from:
     - ipBlock:
        cidr: 192.168.0.0/16
        except:
        - 192.168.137.70/32
  policyTypes:
  - Ingress
EOF

kubectl exec -n external -it pod-3 -- ping [pod-1]
kubectl exec -it pod-2 -- ping [pod-1]

kubectl label pod pod-1 role-
kubectl delete -f netpol.yaml

Rule 6 - Egress To:

kubectl label pod pod-1 role=secure

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: egress-to-ips
spec:
  podSelector:
    matchLabels:
      role: secure
  egress:
  - to:
     - ipBlock:
        cidr: 192.168.137.70/32
  policyTypes:
  - Egress
EOF

kubectl exec -it pod-1 -- ping [pod-2]
kubectl exec -it pod-1 -- ping google.com

kubectl delete -f netpol.yaml

Rule 7 - Namespace Selector:

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: namespace-selector
spec:
  podSelector:
    matchLabels:
      role: secure
  ingress:
  - from:
     - namespaceSelector:
        matchLabels:
          role: app
       podSelector:
         matchLabels:
           role: reconcile
  policyTypes:
  - Ingress
EOF

Create ingress policies with pod selector color=blue on port 80

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: blue
  ingress:
    - from:
        - podSelector:
            matchLabels:
              color: red
      ports:
        - port: 80

Allow ingress traffic from pods in a different namespace using namespaceSelector

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-different-namespace
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: blue
  ingress:
    - from:
        - podSelector:
            matchLabels:
              color: red
          namespaceSelector:
            matchLabels:
              shape: square
      ports:
        - port: 80

Here’s an example of a Network Policy in Kubernetes that allows incoming traffic only from a pod with label color=red in a namespace with label shape=square, on port 80:

Create egress policies

In the following example, outbound traffic is allowed only if they go to a pod with label color=red, on port 80.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-egress-same-namespace
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: blue
  egress:
    - to:
        - podSelector:
            matchLabels:
              color: red
      ports:
        - port: 80

Allow a CIDR range

The following policy allows egress traffic to pods in CIDR, 172.18.0.0/24.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-egress-external
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: red
  egress:
    - to:
        - ipBlock:
            cidr: 172.18.0.0/24

Create deny-all default ingress and egress network policy

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny
  namespace: policy-demo
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
    - Ingress
    - Egress

LAB 2

Complete All the steps above

Kubernetes Monitoring

What is Prometheus ?

Prometheus is a open source Linux Server Monitoring tool mainly used for metrics monitoring, event monitoring, alert management, etc. Prometheus has changed the way of monitoring systems and that is why it has become the Top-Level project of Cloud Native Computing Foundation (CNCF).
Prometheus uses a powerful query language i.e. “PromQL”.
In Prometheus tabs are on and handles hundreds of services and microservices.
Prometheus use multiple modes used for graphing and dashboarding support.

Prometheus Architecture

Promethe port: 9090
Node Exporter: 9100

Prometheus Components

Prometheus Server

Prometheus server is a first component of Prometheus architecture. Prometheus server is a core of Prometheus architecture which is divided into several parts like Storage, PromQL, HTTP server, etc. In Prometheus server data is scraped from the target nodes and then stored int the database.

1.1 Storage

Storage in Prometheus server has a local on disk storge. Prometheus has many interfaces that allow integrating with remote storage systems.

1.2. PromQL

Prometheus uses its own query language i.e. PromQL which is very powerful querying language. PromQL allows the user to select and aggregate the data.

Service Discovery

Next and very important component of Prometheus Server is the Service Discovery. With the help of Service discovery the services are identified which are need to scraped. To Pull metrics, identification of services and finding the targets are compulsory needed. Through Service discovery we monitor the entities and can also locate its targets.
Scrape Target

Once the services are identified and the targets are ready then we can pull metrics from it and can scrape the target. We can export the data of end point using node exporters. Once the metrics or other data is pulled, Prometheus stores it in a local storage.
Alert Manager

Alert Manager handles the alerts which may occurs during the session. Alert manager handles all the alerts which are sent by Prometheus server. Alert manager is one of the very useful component of Prometheus tool. If in case any big error or any issue occurs, alert manager manage those alerts and contact with human via E-mail, Text Messages, On-call, or any other chat application service.
User Interface

User interface is also a important component as it builds a bridge between the user and the system. In Prometheus, user interface are note that much user friendly and can be used till graph queries. For good exclusive dashboards Prometheus works together with Grafana (visualization tool). Using Grafana over Prometheus to visualize properly we can use custom dashboards. Grafana dashboards displays via pie charts, line charts, tables, good data graphs of CPU usage, RAM utilization, network load, etc with indicators. Grafana supports and run with Prometheus by querying language i.e. PromQL. To fetch data from Prometheus and to display the results on Grafana dashboards PromQL is used.

What is Grafana ?
Grafana is a free and open source visualization tool mostly used with Prometheus to which monitor metrics. Grafana provides various dashboards, charts, graphs, alerts for the particular data source. Grafana allows us to query, visualize, explore metrics and set alerts for the data source which can be a system, server, nodes, cluster, etc. We can also create our own dynamic dashboard for visualization and monitoring. We can save the dashboard and can even share with our team members which is one of the main advantage of Grafana.

What is Node Exporter ?

Node exporter is one of the Prometheus exporters which is used to expose servers or system OS metrics.

With the help of Node exporter we can expose various resources of the system like RAM, CPU utilization, Memory Utilization, disk space.

Node exporter runs as a system service which gathers the metrics of your system and that gathered metrics is displayed with the help of Grafana visualization tool.

Prerequisites

Access to Kubernetes Cluster
Helm cli installed

Add Prometheus Helm Repo

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Search for Prometheus helm chart

helm search repo prometheus-community

Install Prometheus Helm Chart on Kubernetes Cluster

helm install prometheus prometheus-community/prometheus

Check all services pods,deployment

kubectl get svc,deployment,pods,secret

Exposing the prometheus-server Kubernetes Service

kubectl expose service prometheus-server --type=LoadBalancer --target-port=9090 --name=prometheus-server-ext

Access Prometheus from UI

LoadBalancer Ip

Promql Most used queries

To list all nodes:

node_uname_info

Node CPU Utilization

sum(rate(node_cpu_seconds_total[5m])) by (instance)

Node Memory Utilization

node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100

Node Disk Utilization

sum(node_filesystem_size_bytes - node_filesystem_free_bytes) by (instance) / sum(node_filesystem_size_bytes) by (instance) * 100

Node Network Utilization

sum(rate(node_network_receive_bytes_total[5m])) by (instance) / 1024 / 1024

Pod Resource Utilization (CPU/Memory)

sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

sum(container_memory_usage_bytes) by (pod)

Total Memory Requests for a Specific Namespace

sum(kube_pod_container_resource_requests_memory_bytes{namespace="namespace1"})

Total Memory Requests for Multiple Namespaces

sum(kube_pod_container_resource_requests_memory_bytes{namespace=~"namespace1|namespace2|namespace3"})

Total Memory Requests for All Namespaces

sum(kube_pod_container_resource_requests_memory_bytes)

Total Memory Usage for a Specific Namespace

sum(rate(container_memory_usage_bytes{namespace="namespace1"}[2h]))

Check targets, Rules etc

AlertManager Configuration

Add rules to prometheus
Create a file with any name

vi prometheus.yaml
serverFiles:
  alerting_rules.yml:
      groups:
      - name: NodeDown
        rules:
        # Alert for any instance that is unreachable for >5 minutes.
        - alert: InstanceDown
          expr: up{job="kubernetes-nodes"} == 0
          for: 2m
          labels:
            severity: page
          annotations:
            host: "{{ $labels.kubernetes_io_hostname }}"
            summary: "Instance down"
            description: "Node {{ $labels.kubernetes_io_hostname  }}has been down for more than 5 minutes."
      - name: low_memory_alert
        rules:
        - alert: LowMemory
          expr: (node_memory_MemAvailable_bytes /  node_memory_MemTotal_bytes) * 100 < 85
          for: 2m
          labels:
            severity: warning
          annotations:
            host: "{{ $labels.kubernetes_node  }}"
            summary: "{{ $labels.kubernetes_node }} Host is low on memory.  Only {{ $value }}% left"
            description: "{{ $labels.kubernetes_node }}  node is low on memory.  Only {{ $value }}% left"
        - alert: KubePersistentVolumeErrors
          expr: kube_persistentvolume_status_phase{job="kubernetes-service-endpoints",phase=~"Failed|Pending"} > 0
          for: 2m
          labels:
            severity: critical
          annotations:
            description: The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.
            summary: PersistentVolume is having issues with provisioning.
        - alert: KubePodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total{job="kubernetes-service-endpoints",namespace=~".*"}[5m]) * 60 * 5 > 0
          for: 2m
          labels:
            severity: warning
          annotations:
            description: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.
            summary: Pod is crash looping.
        - alert: KubePodNotReady
          expr: sum by(namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job="kubernetes-service-endpoints",namespace=~".*",phase=~"Pending|Unknown"}) * on(namespace, pod)    group_left(owner_kind) topk by(namespace, pod) (1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!="Job"}))) > 0
          for: 2m
          labels:
            severity: warning
          annotations:
            description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 5 minutes.
            summary: Pod has been in a non-ready state for more than 2 minutes.

Update Promethus for the new rules

helm upgrade  prometheus prometheus-community/prometheus -f prometheus.yaml

check all rules added on prometheus UI
Also check for Alertmanage using port forwarding port for alertmanager is 9093

export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=alertmanager,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9093

Configure Alert Manager for Email Notifications

Get the configmap being used by Alert Manager

kubectl get configmaps

Edit the configmap

kubectl edit configmap prometheus-alertmanager

Add the contents

apiVersion: v1
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 1m
      # slack_api_url: ''

    receivers:
    - name: 'gmail-notifications'
      email_configs:
      - to: khushiramsingh680@gmail.com
        from: khushiramsingh680@gmail.com # Update your from mail id here
        smarthost: smtp.gmail.com:587
        auth_username: khushiramsingh680@gmail.com # Update your from mail id here
        auth_identity: khushiramsingh680@gmail.com # Update your from mail id here
        auth_password: ****** # Update your app-password here
        send_resolved: true
        headers:
          subject: "Prometheus - Alert"
        text: "{{ range .Alerts }} Hi, \n{{ .Annotations.summary }}  \n{{ .Annotations.description }} {{end}}"
        # slack_configs:
        #  - channel: '@you'
        #    send_resolved: true

    route:
      group_wait: 10s
      group_interval: 2m
      receiver: 'gmail-notifications'
      repeat_interval: 2m

kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2024-08-30T15:07:40Z"
  labels:
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/version: v0.27.0
    helm.sh/chart: alertmanager-1.12.0
  name: prometheus-alertmanager
  namespace: default
  resourceVersion: "233182"
  uid: c1acda6d-df1f-430d-8c3d-903f568b7c22

Delete the alert manager pod

kubectl delete pod prometheus-alertmanager-0

Now check UI of Prometheus and AlertManager , check your mail also for any alert

LAB 3

Add prometheus helm repo
Install prometheus using helm
Configure port forward to access prometheus
Check all configure
Run all promql query given in doc
Create credentials in gmail
update alert manager configmap
Test the config for alertmanager
Check if you have received any alert

Install Grafana

Search for Grafana Chart

helm search hub grafana

Add and update Grafana repo

helm repo add grafana https://grafana.github.io/helm-charts 
helm repo update

Install Grafana Helm Chart on Kubernetes Cluster

helm install grafana grafana/grafana

Check for Services and other details

kubectl get service

Exposing the grafana Kubernetes Service

kubectl expose service grafana --type=LoadBalancer --target-port=3000 --name=grafana-ext

Take the password for Grafana user(Admin)

kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Login to Grafana and add datasource
Select Prometheus as the data source:
Click on “Save & test” to save your changes.

Grafana Dashboard

To import a Grafana Dashboard, follow these steps:

Click here

LAB 4

Add helm repo
Install garfana using helm
Configure port forwaring to acces the Grfana
Fetch the admin password and login
Create a Dashboard for your kubernetes node exporter.

Kuberntes Monitoring with Grafana Loki

Kubernetes Monitoring

source https://artifacthub.io/packages/helm/grafana/loki-stack

Grafana Loki

Loki-Stack Helm Chart

Get Repo Info

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Deploy Loki and Promtail to your cluster

helm upgrade --install loki grafana/loki-stack \
    --set fluent-bit.enabled=false,promtail.enabled=true,grafana.enabled=true

To get the admin password for the Grafana pod, run the following command:

kubectl get secret --namespace <YOUR-NAMESPACE> loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

To access Grafana you can use port forwarding or Service LoadBalancer

kubectl port-forward --namespace <YOUR-NAMESPACE> service/loki-grafana 3000:80

LAB 4

Complete all above steps

Etcd Backup and Restore on Kubernetes Cluster

ssh vagrant@172.16.16.100

Change user to root

sudo -s

Install etcd cli

sudo apt install etcd-client

We need to pass the following three pieces of information to etcdctl to take an etcd snapshot.

etcd endpoint (–endpoints)
ca certificate (–cacert)
server certificate (–cert)
server key (–key)
Check the file for above information

cat /etc/kubernetes/manifests/etcd.yaml

Take an etcd snapshot backup using the following command.

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=<ca-file> \
  --cert=<cert-file> \
  --key=<key-file> \
  snapshot save <backup-file-location>

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /opt/backup/etcd.db

you can verify the snapshot using the following command.

ETCDCTL_API=3 etcdctl --write-out=table snapshot status /opt/backup/etcd.db

Kubernetes etcd Restore Using Snapshot Backup
Use the below command

ETCDCTL_API=3 etcdctl snapshot restore <backup-file-location>

ETCDCTL_API=3 etcdctl snapshot restore /opt/backup/etcd.db

ETCDCTL_API=3 etcdctl --data-dir /opt/etcd snapshot restore /opt/backup/etcd.db

LAB 5

Complete all the steps above

All namespaced resources

kubectl api-resources --namespaced=true      
kubectl api-resources --namespaced=false    
kubectl api-resources -o name                
kubectl api-resources -o wide                
kubectl api-resources --verbs=list,get      
kubectl api-resources --api-group=extensions

Part 5

Topics

Rancher (Kubernetes Management Tool)

LAB 1

Get started with Kubernetes network policy

Ingress and egress

Default deny/allow behavior

Network Policy

Rule 4 - PodSelector:

Create ingress policies with pod selector color=blue on port 80

Allow ingress traffic from pods in a different namespace using namespaceSelector

Create egress policies

Allow a CIDR range

Create deny-all default ingress and egress network policy

LAB 2

Kubernetes Monitoring

What is Prometheus ?

Prometheus Architecture

Prometheus Components

What is Grafana ?

What is Node Exporter ?

Prerequisites

Add Prometheus Helm Repo

Search for Prometheus helm chart

Install Prometheus Helm Chart on Kubernetes Cluster

Check all services pods,deployment

Exposing the prometheus-server Kubernetes Service

Access Prometheus from UI

Promql Most used queries

To list all nodes:

Node CPU Utilization

Node Memory Utilization

Node Disk Utilization

Node Network Utilization

Pod Resource Utilization (CPU/Memory)

Total Memory Requests for a Specific Namespace

Total Memory Requests for Multiple Namespaces

Total Memory Requests for All Namespaces

Total Memory Usage for a Specific Namespace

Check targets, Rules etc

AlertManager Configuration

Update Promethus for the new rules

Configure Alert Manager for Email Notifications

LAB 3

Install Grafana

Grafana Dashboard

LAB 4

Kuberntes Monitoring with Grafana Loki

Kubernetes Monitoring

source https://artifacthub.io/packages/helm/grafana/loki-stack

Grafana Loki

Loki-Stack Helm Chart

Get Repo Info

LAB 4

Etcd Backup and Restore on Kubernetes Cluster

We need to pass the following three pieces of information to etcdctl to take an etcd snapshot.

LAB 5

Kubernetes Api related commands