Devops

Topics

Overview of Devops
Overview of SDLC model
- Waterfall
- Agile
Vm management
- Virtualbox
- Vagrant
- Google Cloud
Linux Basics Overview
- Linux File Structure
- Linux file management
- Linux Network Management
- Linux Process Management
- Linux User Management
- Linux Service Management
Web Server
- Http with ssl
- Reverse Proxy
- Letus encrypt for ssl certificate
- SSL related commands
ApplicationServer
- Tomcat
- Securing tomcat with ssl
DNS Overview and testing utilities
- nslookup
- dig
Tools Covered
- Code Management
  - Git
- CICD Tools
  - Jenkins
  - Gitlab
  - Github
- Infrastracture as a code
  - Terraform
- Configuration Management
  - Ansible
- Containerization
  - Docker
  - Kubernetes
- Static Code Testing
  - Sonarqube
- Monitoring & Logging
  - Prometheus
  - Grafana
  - Grafana Loki

Devops Overview

DevOps is a set of practices, tools, and a cultural philosophy that automate and integrate the processes between software development and IT teams. It emphasizes team empowerment, cross-team communication and collaboration, and technology automation.

DevOps combines development and operations to increase the efficiency, speed, and security of software development and delivery compared to traditional processes. A more nimble software development lifecycle results in a competitive advantage for businesses and their customers.

The DevOps lifecycle and how DevOps works

The DevOps lifecyle stretches from the beginning of software development through to delivery, maintenance, and security. The stages of the DevOps lifecycle are:

Plan

Organize the work that needs to be done, prioritize it, and track its completion.

Create

Write, design, develop and securely manage code and project data with your team.

Verify

Ensure that your code works correctly and adheres to your quality standards — ideally with automated testing.

Package

Package your applications and dependencies, manage containers, and build artifacts.

Secure

Check for vulnerabilities through static and dynamic tests, fuzz testing, and dependency scanning.

Release

Deploy the software to end users.

Configure

Manage and configure the infrastructure required to support your applications.

Monitor

Track performance metrics and errors to help reduce the severity and frequency of incidents.

Govern

Manage security vulnerabilities, policies, and compliance across your organization.

What is CI/CD?

Monitoring

Prometheus
Node_Exporter
Wmi Exporter
BlackBox Exporter
AlertManager

Prometheus

Prometheus is a free and open-source toolkit used for time series event monitoring and alerting. It was originally developed by SoundCloud in 2012. Prometheus was then adopted by several companies and active developers around the world. In 2016, it was elevated to the Cloud Native Computing Foundation as the second hosted project after Kubernetes.

Features:

Time-series metrics collection which happens via a pull model over HTTP
It uses PromQL, which is a flexible query language to leverage this dimensionality
It doesn’t rely on distributed storage; single server nodes are autonomous
It uses a multi-dimensional data model where time series data is identified by metric name and key/value pairs
Targets are discovered via service discovery or static configuration
It supports multiple modes of graphing and dashboarding.

The Prometheus ecosystem encompasses several components. They include:

Prometheus server – scrapes and stores time series data
Client libraries – used to instrument application code
Push gateway – to support short-lived jobs
Special-purpose exporters – for services such as HAProxy, StatsD, Graphite e.t.c
Alert manager – for alert handling

Prerequisites for Prometheus Installation

Virtualbox
Prometheus
Linux Server 1 CPU and 2Gb of Memory
Create 2 linux vms

Update the linux Os

dnf update -y

Add Prometheus Repositories

Enable Epel Repo

sudo dnf -y install epel-release vim

Add prometheus Repo

curl -s https://packagecloud.io/install/repositories/prometheus-rpm/release/script.rpm.sh | sudo bash

Install Prometheus

sudo dnf install prometheus -y

Once installed, Prometheus stores its data at /var/lib/prometheus and config files at /etc/prometheus/.
The default configuration file is at /etc/prometheus/prometheus.yml
Start and enable service using below commands:

sudo systemctl start prometheus
sudo systemctl enable prometheus

check the status of prometheus service

systemctl status prometheus

Allow the port through the firewall if firewall is in use

sudo firewall-cmd --add-port=9090/tcp --permanent
sudo firewall-cmd --reload

you can access Prometheus using the URL http://IP_Address:9090

Check all targets
Configuration files and other details

Install and Configure Node Exporter

Add prometheus Repo

curl -s https://packagecloud.io/install/repositories/prometheus-rpm/release/script.rpm.sh | sudo bash

Install node_exporter

sudo dnf -y install node_exporter

Start and enable the service after installation:

sudo systemctl enable --now node_exporter

Check node_exporter service

systemctl status node_exporter

Check the port used

sudo ss -aplnt | grep node

Add to firewall if firewall is running already

sudo firewall-cmd --add-port=9100/tcp --permanent
sudo firewall-cmd --reload

Now on Prometheus server make the entry of this newly added node

sudo vim /etc/prometheus/prometheus.yml
- job_name: 'node_exporter_metrics'
    scrape_interval: 5s
    static_configs:
      - targets: ['SERVER-IP:9100']

Restart Prometheus service

sudo systemctl restart prometheus

Install and Configure Blackbox exporter

The Blackbox exporter enables probing of endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. You can install from YUM repository or manually.

Install Blackbox exporter from Prometheus repository

sudo dnf -y install blackbox_exporter

The settings of Blackbox exporter are in the file /etc/prometheus/blackbox.yml. You can modify or just start and enable the service:

Start and enable node-exporter service

sudo systemctl enable --now blackbox_exporter

Enable firewall if running

sudo firewall-cmd --add-port=9115/tcp --permanent
sudo firewall-cmd --reload

How to check prometheus configuration file

 promtool  check config /etc/prometheus/prometheus.yml

Make an entry in Prometheus.yaml file

  - job_name: 'Blackbox_ssh'
   metrics_path: /probe
   params:
     module: [ssh_banner]
   static_configs:
     - targets:
       - 172.16.16.120:26
   relabel_configs:
     - source_labels: [__address__]
       target_label: __param_target
     - source_labels: [__param_target]
       target_label: instance
     - target_label: __address__
       replacement: 172.16.16.120:9115

Restart prometheus service

systemct restart prometheus

Configuring Alert Manager

Alertmanager is the application that handles the alerts sent by the applications and notifies the user via E-mail, Slack, or other tools. Alert rules defined in Prometheus are taken into consideration when scraping metrics. If any of the alert conditions are hit depending on the rules, Prometheus pushes them to the AlertManager.

The Alertmanager can handle grouping, deduplication, and routing of alerts to the correct receiver. It manages alerts through its pipelines, which are:

Silencing: mutes alerts for a given period
Grouping: groups alerts of similar nature into a single notification to avoid sending multiple notifications.
Inhibition: suppresses specific alerts if other alerts are already fired.
Install Alert Manager

 dnf install alertmanager -y

Alert Manager config file

/etc/alertmanager/alertmanager.yml
/etc/prometheus/alertmanager.yml

repeat_interval tells the AlertManager to wait for the set time before sending another notification. The default value is 1 hour, but you can adjust it as desired.
receiver: ’email’ sets the default receiver to be used. For this tutorial, we have set the default receiver as email.
receivers: lists the available receivers with their configurations. for example web.hook and email as above
Modify AlertManager config

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email'
receivers:
  - name: 'email'
    email_configs:
    - to: 'Test@gmail.com'
      from: 'test@gmail.com'
      smarthost: smtp.gmail.com:587
      auth_username: 'test@gmail.com'
      auth_password: 'pzckclkrhpfnscts'
      send_resolved: true

Check alert manager configuration

amtool check-config /etc/alertmanager/alertmanager.yml
amtool check-config /etc/prometheus/alertmanager.yml

Modify the pormetheus configuration settings

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
           - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "alert_rules.yml"
  # - "second_rules.yml"

Create a file

vi /etc/prometheus/alert_rules.yml
groups:
- name: alert_rules
  rules:
   - alert: InstanceDown
     expr: up == 1
     for: 1m

Promql (WIP)

Promqul: Basics

Prometheus Metric and Data Types

Prometheus has four metric types:

Counters
Gauges
Histograms
Summaries

https://logz.io/blog/promql-examples-introduction/

Matcher and Selector: Filters are used with {“job”=“prometheus”} {“job”=“prometheus”,“node”=“node1”}

Equality Mactcher = Negative Equality Matcher {“job”!=“node2”} Regualar Expression: {=~} prometheus_http_request_total{handler=~"/api.*"} Negative Regular Expression Matcher {!~}

Range Vector: [1d] [1m]

Binary Operator: Arithmatic

Aggregator:

Find the number of pods per namespace

sum by (namespace) (kube_pod_info)

Find CPU overcommit


sum(kube_pod_container_resource_limits{resource="cpu"}) - sum(kube_node_status_capacity{resource="cpu"})

Memory overcommit


sum(kube_pod_container_resource_limits{resource="memory"}) - sum(kube_node_status_capacity{resource="memory"})

Find unhealthy Kubernetes pods


min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0

Find Kubernetes pods CrashLooping


increase(kube_pod_container_status_restarts_total[15m]) > 3

Find the number of containers without CPU limits in each namespace


count by (namespace)(sum by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))

Find PersistentVolumeClaim in the pending state


kube_persistentvolumeclaim_status_phase{phase="Pending"}

Find unstable nodes


sum(changes(kube_node_status_condition{status="true",condition="Ready"}[15m])) by (node) > 2

Find idle CPU cores


sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 >0)

Find idle memory


sum((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 >0 ) / (1024*1024*1024)

Find node status


sum(kube_node_status_condition{condition="Ready",status="true"})
sum(kube_node_status_condition{condition="NotReady",status="true"})
sum(kube_node_spec_unschedulable) by (node)

Prometheus Demo