Part 5

Topics

Rancher (Kubernetes Management Tool)

Rancher Installation Steps

  • Rancher is a Kubernetes management tool to deploy and run clusters anywhere and on any provider.

  • Rancher can provision Kubernetes from a hosted provider, provision compute nodes and then install Kubernetes onto them, or import existing Kubernetes clusters running anywhere.

  • Rancher adds significant value on top of Kubernetes, first by centralizing authentication and role-based access control (RBAC) for all of the clusters, giving global admins the ability to control cluster access from one location.

  • It then enables detailed monitoring and alerting for clusters and their resources, ships logs to external providers, and integrates directly with Helm via the Application Catalog. If you have an external CI/CD system, you can plug it into Rancher, but if you don’t, Rancher even includes Fleet to help you automatically deploy and upgrade workloads.

  • Rancher is a complete container management platform for Kubernetes, giving you the tools to successfully run Kubernetes anywhere.

  • Rancher Installation on docker

sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher

LAB 1

  • Create a vm , install docker and Rancher to the newly created vm.
  • Install docker
  • Login to rancher
  • Wait for atleast 4 to 5 mins
  • Monitoring all node pods
  • Try to deploy some pod/app
  • Perform other acrions as well.
    • Running command from Rancher itself
    • Dowload the kubeconfig file form rancher
    • RBAC options
    • Install app from rancher like (Longhorn,neuvector)

Get started with Kubernetes network policy

  • Kubernetes network policy lets administrators and developers enforce which network traffic is allowed using rules.
  • Kubernetes network policy lets developers secure access to and from their applications using the same simple language they use to deploy them. Developers can focus on their applications without understanding low-level networking concepts. Enabling developers to easily secure their applications using network policies supports a shift left DevOps environment.
  • The Kubernetes Network Policy API supports the following features:
    • Policies are namespace scoped
    • Policies are applied to pods using label selectors
    • Policy rules can specify the traffic that is allowed to/from pods, namespaces, or CIDRs
    • Policy rules can specify protocols (TCP, UDP, SCTP), named ports or port numbers

Ingress and egress

  • ingress is incoming traffic to the pod, and egress is outgoing traffic from the pod. In Kubernetes network policy,
  • you create ingress and egress “allow” rules independently (egress, ingress, or both).

Default deny/allow behavior

  • Default allow means all traffic is allowed by default, unless otherwise specified.
  • Default deny means all traffic is denied by default, unless explicitly allowed.

Network Policy

  • Create base setup (Default behaviour is all traffic is allowd)
  • Create a namespace
kubectl create ns external
  • Test the default behaviour by creating pod and try to ping
kubectl run pod-1 --image=praqma/network-multitool
kubectl run pod-2 --image=praqma/network-multitool
kubectl run pod-3 --image=praqma/network-multitool -n external
  • Get Ip address
kubectl get pods -o wide
kubectl get pods -o wide -n external
  • Check connectivity from POD-1 to POD2 and POD-3 and Internet
kubectl exec -it pod-1 -- ping [pod-2-ip]
kubectl exec -it pod-1 -- ping [pod-3-ip]
kubectl exec -it pod-1 -- ping google.com
  • Rule 1: Deny all Ingress traffic
kubectl apply -f - <<EOF
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}
  policyTypes:
  - Ingress
EOF
  • Check newly applied network policy
kubectl get netpol
  • Test incoming traffic should not be allowed
kubectl exec -it pod-1 -- ping [pod2-ip]
kubectl exec -it pod-1 -- ping [pod3-ip]
kubectl exec -it pod-1 -- ping google.com
kubectl exec -it pod-3 -- ping [pod1-ip]
kubectl exec -it pod-3 -- ping [pod2-ip]
  • Rule 2: Allow Ingress
kubectl apply -f - <<EOF
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-ingress
spec:
  podSelector: {}
  ingress:
  - {}
  policyTypes:
  - Ingress
EOF
  • Test again
kubectl exec -it pod-3 -- ping [pod1-ip]
kubectl exec -it pod-3 -- ping [pod2-ip]
  • Rule 3: Deny All Egress:
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
spec:
  podSelector: {}
  policyTypes:
  - Egress
EOF
  • Make a test
kubectl exec -it pod-2 -- ping google.com
kubectl exec -n external -it pod-3 -- ping [pod1]
kubectl exec -n external -it pod-3 -- ping [pod2]

Rule 4 - PodSelector:

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: podselector-suspicous
spec:
  podSelector:
    matchLabels:
      role: suspicious
  policyTypes:
  - Ingress
  - Egress
EOF
  • Rule 5 - Ingress From:
kubectl label pod pod-1 role=secure
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ingress-from-ips
spec:
  podSelector:
    matchLabels:
      role: secure
  ingress:
  - from:
     - ipBlock:
        cidr: 192.168.0.0/16
        except:
        - 192.168.137.70/32
  policyTypes:
  - Ingress
EOF
kubectl exec -n external -it pod-3 -- ping [pod-1]
kubectl exec -it pod-2 -- ping [pod-1]
kubectl label pod pod-1 role-
kubectl delete -f netpol.yaml
  • Rule 6 - Egress To:
kubectl label pod pod-1 role=secure
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: egress-to-ips
spec:
  podSelector:
    matchLabels:
      role: secure
  egress:
  - to:
     - ipBlock:
        cidr: 192.168.137.70/32
  policyTypes:
  - Egress
EOF
kubectl exec -it pod-1 -- ping [pod-2]
kubectl exec -it pod-1 -- ping google.com
kubectl delete -f netpol.yaml
  • Rule 7 - Namespace Selector:
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: namespace-selector
spec:
  podSelector:
    matchLabels:
      role: secure
  ingress:
  - from:
     - namespaceSelector:
        matchLabels:
          role: app
       podSelector:
         matchLabels:
           role: reconcile
  policyTypes:
  - Ingress
EOF

Create ingress policies with pod selector color=blue on port 80

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: blue
  ingress:
    - from:
        - podSelector:
            matchLabels:
              color: red
      ports:
        - port: 80

Allow ingress traffic from pods in a different namespace using namespaceSelector

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-different-namespace
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: blue
  ingress:
    - from:
        - podSelector:
            matchLabels:
              color: red
          namespaceSelector:
            matchLabels:
              shape: square
      ports:
        - port: 80

Here’s an example of a Network Policy in Kubernetes that allows incoming traffic only from a pod with label color=red in a namespace with label shape=square, on port 80:

Create egress policies

In the following example, outbound traffic is allowed only if they go to a pod with label color=red, on port 80.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-egress-same-namespace
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: blue
  egress:
    - to:
        - podSelector:
            matchLabels:
              color: red
      ports:
        - port: 80

Allow a CIDR range

  • The following policy allows egress traffic to pods in CIDR, 172.18.0.0/24.
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-egress-external
  namespace: default
spec:
  podSelector:
    matchLabels:
      color: red
  egress:
    - to:
        - ipBlock:
            cidr: 172.18.0.0/24

Create deny-all default ingress and egress network policy

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny
  namespace: policy-demo
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
    - Ingress
    - Egress

LAB 2

  • Complete All the steps above

Kubernetes Monitoring

What is Prometheus ?

  • Prometheus is a open source Linux Server Monitoring tool mainly used for metrics monitoring, event monitoring, alert management, etc. Prometheus has changed the way of monitoring systems and that is why it has become the Top-Level project of Cloud Native Computing Foundation (CNCF).
  • Prometheus uses a powerful query language i.e. “PromQL”.
  • In Prometheus tabs are on and handles hundreds of services and microservices.
  • Prometheus use multiple modes used for graphing and dashboarding support.

Prometheus Architecture

Promethe port: 9090
Node Exporter: 9100

Prometheus Components

  1. Prometheus Server

    Prometheus server is a first component of Prometheus architecture. Prometheus server is a core of Prometheus architecture which is divided into several parts like Storage, PromQL, HTTP server, etc. In Prometheus server data is scraped from the target nodes and then stored int the database.

1.1 Storage

Storage in Prometheus server has a local on disk storge. Prometheus has many interfaces that allow integrating with remote storage systems.

1.2. PromQL

Prometheus uses its own query language i.e. PromQL which is very powerful querying language. PromQL allows the user to select and aggregate the data.

  1. Service Discovery

    Next and very important component of Prometheus Server is the Service Discovery. With the help of Service discovery the services are identified which are need to scraped. To Pull metrics, identification of services and finding the targets are compulsory needed. Through Service discovery we monitor the entities and can also locate its targets.

  2. Scrape Target

    Once the services are identified and the targets are ready then we can pull metrics from it and can scrape the target. We can export the data of end point using node exporters. Once the metrics or other data is pulled, Prometheus stores it in a local storage.

  3. Alert Manager

    Alert Manager handles the alerts which may occurs during the session. Alert manager handles all the alerts which are sent by Prometheus server. Alert manager is one of the very useful component of Prometheus tool. If in case any big error or any issue occurs, alert manager manage those alerts and contact with human via E-mail, Text Messages, On-call, or any other chat application service.

  4. User Interface

    User interface is also a important component as it builds a bridge between the user and the system. In Prometheus, user interface are note that much user friendly and can be used till graph queries. For good exclusive dashboards Prometheus works together with Grafana (visualization tool). Using Grafana over Prometheus to visualize properly we can use custom dashboards. Grafana dashboards displays via pie charts, line charts, tables, good data graphs of CPU usage, RAM utilization, network load, etc with indicators. Grafana supports and run with Prometheus by querying language i.e. PromQL. To fetch data from Prometheus and to display the results on Grafana dashboards PromQL is used.

    What is Grafana ?

    Grafana is a free and open source visualization tool mostly used with Prometheus to which monitor metrics. Grafana provides various dashboards, charts, graphs, alerts for the particular data source. Grafana allows us to query, visualize, explore metrics and set alerts for the data source which can be a system, server, nodes, cluster, etc. We can also create our own dynamic dashboard for visualization and monitoring. We can save the dashboard and can even share with our team members which is one of the main advantage of Grafana.

What is Node Exporter ?

Node exporter is one of the Prometheus exporters which is used to expose servers or system OS metrics.

With the help of Node exporter we can expose various resources of the system like RAM, CPU utilization, Memory Utilization, disk space.

Node exporter runs as a system service which gathers the metrics of your system and that gathered metrics is displayed with the help of Grafana visualization tool.

Prerequisites

  • Access to Kubernetes Cluster
  • Helm cli installed

Add Prometheus Helm Repo

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Search for Prometheus helm chart

helm search repo prometheus-community

Install Prometheus Helm Chart on Kubernetes Cluster

helm install prometheus prometheus-community/prometheus

Check all services pods,deployment

kubectl get svc,deployment,pods,secret

Exposing the prometheus-server Kubernetes Service

kubectl expose service prometheus-server --type=LoadBalancer --target-port=9090 --name=prometheus-server-ext

Access Prometheus from UI

LoadBalancer Ip

Promql Most used queries

To list all nodes:

node_uname_info

Node CPU Utilization

sum(rate(node_cpu_seconds_total[5m])) by (instance)

Node Memory Utilization

node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100

Node Disk Utilization

sum(node_filesystem_size_bytes - node_filesystem_free_bytes) by (instance) / sum(node_filesystem_size_bytes) by (instance) * 100

Node Network Utilization

sum(rate(node_network_receive_bytes_total[5m])) by (instance) / 1024 / 1024

Pod Resource Utilization (CPU/Memory)

sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

sum(container_memory_usage_bytes) by (pod)

Total Memory Requests for a Specific Namespace

sum(kube_pod_container_resource_requests_memory_bytes{namespace="namespace1"})

Total Memory Requests for Multiple Namespaces

sum(kube_pod_container_resource_requests_memory_bytes{namespace=~"namespace1|namespace2|namespace3"})

Total Memory Requests for All Namespaces

sum(kube_pod_container_resource_requests_memory_bytes)

Total Memory Usage for a Specific Namespace

sum(rate(container_memory_usage_bytes{namespace="namespace1"}[2h]))

Check targets, Rules etc

AlertManager Configuration

  • Add rules to prometheus
  • Create a file with any name
vi prometheus.yaml
serverFiles:
  alerting_rules.yml:
      groups:
      - name: NodeDown
        rules:
        # Alert for any instance that is unreachable for >5 minutes.
        - alert: InstanceDown
          expr: up{job="kubernetes-nodes"} == 0
          for: 2m
          labels:
            severity: page
          annotations:
            host: "{{ $labels.kubernetes_io_hostname }}"
            summary: "Instance down"
            description: "Node {{ $labels.kubernetes_io_hostname  }}has been down for more than 5 minutes."
      - name: low_memory_alert
        rules:
        - alert: LowMemory
          expr: (node_memory_MemAvailable_bytes /  node_memory_MemTotal_bytes) * 100 < 85
          for: 2m
          labels:
            severity: warning
          annotations:
            host: "{{ $labels.kubernetes_node  }}"
            summary: "{{ $labels.kubernetes_node }} Host is low on memory.  Only {{ $value }}% left"
            description: "{{ $labels.kubernetes_node }}  node is low on memory.  Only {{ $value }}% left"
        - alert: KubePersistentVolumeErrors
          expr: kube_persistentvolume_status_phase{job="kubernetes-service-endpoints",phase=~"Failed|Pending"} > 0
          for: 2m
          labels:
            severity: critical
          annotations:
            description: The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.
            summary: PersistentVolume is having issues with provisioning.
        - alert: KubePodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total{job="kubernetes-service-endpoints",namespace=~".*"}[5m]) * 60 * 5 > 0
          for: 2m
          labels:
            severity: warning
          annotations:
            description: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.
            summary: Pod is crash looping.
        - alert: KubePodNotReady
          expr: sum by(namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job="kubernetes-service-endpoints",namespace=~".*",phase=~"Pending|Unknown"}) * on(namespace, pod)    group_left(owner_kind) topk by(namespace, pod) (1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!="Job"}))) > 0
          for: 2m
          labels:
            severity: warning
          annotations:
            description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 5 minutes.
            summary: Pod has been in a non-ready state for more than 2 minutes.

Update Promethus for the new rules

helm upgrade  prometheus prometheus-community/prometheus -f prometheus.yaml
  • check all rules added on prometheus UI
  • Also check for Alertmanage using port forwarding port for alertmanager is 9093
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=alertmanager,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9093

Configure Alert Manager for Email Notifications

  • Get the configmap being used by Alert Manager
kubectl get configmaps
  • Edit the configmap
kubectl edit configmap prometheus-alertmanager
  • Add the contents
apiVersion: v1
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 1m
      # slack_api_url: ''

    receivers:
    - name: 'gmail-notifications'
      email_configs:
      - to: khushiramsingh680@gmail.com
        from: khushiramsingh680@gmail.com # Update your from mail id here
        smarthost: smtp.gmail.com:587
        auth_username: khushiramsingh680@gmail.com # Update your from mail id here
        auth_identity: khushiramsingh680@gmail.com # Update your from mail id here
        auth_password: ****** # Update your app-password here
        send_resolved: true
        headers:
          subject: "Prometheus - Alert"
        text: "{{ range .Alerts }} Hi, \n{{ .Annotations.summary }}  \n{{ .Annotations.description }} {{end}}"
        # slack_configs:
        #  - channel: '@you'
        #    send_resolved: true

    route:
      group_wait: 10s
      group_interval: 2m
      receiver: 'gmail-notifications'
      repeat_interval: 2m

kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2024-08-30T15:07:40Z"
  labels:
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/version: v0.27.0
    helm.sh/chart: alertmanager-1.12.0
  name: prometheus-alertmanager
  namespace: default
  resourceVersion: "233182"
  uid: c1acda6d-df1f-430d-8c3d-903f568b7c22
  • Delete the alert manager pod
kubectl delete pod prometheus-alertmanager-0
  • Now check UI of Prometheus and AlertManager , check your mail also for any alert

LAB 3

  • Add prometheus helm repo
  • Install prometheus using helm
  • Configure port forward to access prometheus
  • Check all configure
  • Run all promql query given in doc
  • Create credentials in gmail
  • update alert manager configmap
  • Test the config for alertmanager
  • Check if you have received any alert

Install Grafana

  • Search for Grafana Chart
helm search hub grafana
  • Add and update Grafana repo
helm repo add grafana https://grafana.github.io/helm-charts 
helm repo update
  • Install Grafana Helm Chart on Kubernetes Cluster
helm install grafana grafana/grafana
  • Check for Services and other details
kubectl get service
  • Exposing the grafana Kubernetes Service
kubectl expose service grafana --type=LoadBalancer --target-port=3000 --name=grafana-ext
  • Take the password for Grafana user(Admin)
kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
  • Login to Grafana and add datasource image

  • Select Prometheus as the data source: image image

  • Click on “Save & test” to save your changes.

Grafana Dashboard

  • To import a Grafana Dashboard, follow these steps:

Click here

image

image

image

image

LAB 4

  • Add helm repo
  • Install garfana using helm
  • Configure port forwaring to acces the Grfana
  • Fetch the admin password and login
  • Create a Dashboard for your kubernetes node exporter.

Kuberntes Monitoring with Grafana Loki

Kubernetes Monitoring

source https://artifacthub.io/packages/helm/grafana/loki-stack

Grafana Loki

Loki-Stack Helm Chart

Get Repo Info

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
  • Deploy Loki and Promtail to your cluster
helm upgrade --install loki grafana/loki-stack \
    --set fluent-bit.enabled=false,promtail.enabled=true,grafana.enabled=true
  • To get the admin password for the Grafana pod, run the following command:
kubectl get secret --namespace <YOUR-NAMESPACE> loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
  • To access Grafana you can use port forwarding or Service LoadBalancer
kubectl port-forward --namespace <YOUR-NAMESPACE> service/loki-grafana 3000:80

LAB 4

  • Complete all above steps

Etcd Backup and Restore on Kubernetes Cluster

  • Login to control plane
ssh vagrant@172.16.16.100
  • Change user to root
sudo -s
  • Install etcd cli
sudo apt install etcd-client

We need to pass the following three pieces of information to etcdctl to take an etcd snapshot.

  • etcd endpoint (–endpoints)

  • ca certificate (–cacert)

  • server certificate (–cert)

  • server key (–key)

  • Check the file for above information

cat /etc/kubernetes/manifests/etcd.yaml

image

  • Take an etcd snapshot backup using the following command.
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=<ca-file> \
  --cert=<cert-file> \
  --key=<key-file> \
  snapshot save <backup-file-location>

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /opt/backup/etcd.db
  • you can verify the snapshot using the following command.
ETCDCTL_API=3 etcdctl --write-out=table snapshot status /opt/backup/etcd.db
  • Kubernetes etcd Restore Using Snapshot Backup

  • Use the below command

ETCDCTL_API=3 etcdctl snapshot restore <backup-file-location>

ETCDCTL_API=3 etcdctl snapshot restore /opt/backup/etcd.db

ETCDCTL_API=3 etcdctl --data-dir /opt/etcd snapshot restore /opt/backup/etcd.db

LAB 5

  • Complete all the steps above
  • All namespaced resources
kubectl api-resources --namespaced=true      
kubectl api-resources --namespaced=false    
kubectl api-resources -o name                
kubectl api-resources -o wide                
kubectl api-resources --verbs=list,get      
kubectl api-resources --api-group=extensions