Devops

Topics

  • Overview of Devops
  • Overview of SDLC model
    • Waterfall
    • Agile
  • Vm management
    • Virtualbox
    • Vagrant
    • Google Cloud
  • Linux Basics Overview
    • Linux File Structure
    • Linux file management
    • Linux Network Management
    • Linux Process Management
    • Linux User Management
    • Linux Service Management
  • Web Server
    • Http with ssl
    • Reverse Proxy
    • Letus encrypt for ssl certificate
    • SSL related commands
  • ApplicationServer
    • Tomcat
    • Securing tomcat with ssl
  • DNS Overview and testing utilities
    • nslookup
    • dig
  • Tools Covered
    • Code Management
      • Git
    • CICD Tools
      • Jenkins
      • Gitlab
      • Github
    • Infrastracture as a code
      • Terraform
    • Configuration Management
      • Ansible
    • Containerization
      • Docker
      • Kubernetes
    • Static Code Testing
      • Sonarqube
    • Monitoring & Logging
      • Prometheus
      • Grafana
      • Grafana Loki

Devops Overview

DevOps is a set of practices, tools, and a cultural philosophy that automate and integrate the processes between software development and IT teams. It emphasizes team empowerment, cross-team communication and collaboration, and technology automation.

DevOps combines development and operations to increase the efficiency, speed, and security of software development and delivery compared to traditional processes. A more nimble software development lifecycle results in a competitive advantage for businesses and their customers.

The DevOps lifecycle and how DevOps works

The DevOps lifecyle stretches from the beginning of software development through to delivery, maintenance, and security. The stages of the DevOps lifecycle are:

Plan

Organize the work that needs to be done, prioritize it, and track its completion.

Create

Write, design, develop and securely manage code and project data with your team.

Verify

Ensure that your code works correctly and adheres to your quality standards — ideally with automated testing.

Package

Package your applications and dependencies, manage containers, and build artifacts.

Secure

Check for vulnerabilities through static and dynamic tests, fuzz testing, and dependency scanning.

Release

Deploy the software to end users.

Configure

Manage and configure the infrastructure required to support your applications.

Monitor

Track performance metrics and errors to help reduce the severity and frequency of incidents.

Govern

Manage security vulnerabilities, policies, and compliance across your organization.

What is CI/CD?

Monitoring

  • Prometheus
  • Node_Exporter
  • Wmi Exporter
  • BlackBox Exporter
  • AlertManager

Prometheus

Prometheus is a free and open-source toolkit used for time series event monitoring and alerting. It was originally developed by SoundCloud in 2012. Prometheus was then adopted by several companies and active developers around the world. In 2016, it was elevated to the Cloud Native Computing Foundation as the second hosted project after Kubernetes.

Features:

  • Time-series metrics collection which happens via a pull model over HTTP
  • It uses PromQL, which is a flexible query language to leverage this dimensionality
  • It doesn’t rely on distributed storage; single server nodes are autonomous
  • It uses a multi-dimensional data model where time series data is identified by metric name and key/value pairs
  • Targets are discovered via service discovery or static configuration
  • It supports multiple modes of graphing and dashboarding.

The Prometheus ecosystem encompasses several components. They include:

  • Prometheus server – scrapes and stores time series data
  • Client libraries – used to instrument application code
  • Push gateway – to support short-lived jobs
  • Special-purpose exporters – for services such as HAProxy, StatsD, Graphite e.t.c
  • Alert manager – for alert handling

Prerequisites for Prometheus Installation

  • Virtualbox
  • Prometheus
  • Linux Server 1 CPU and 2Gb of Memory
  • Create 2 linux vms
  • Update the linux Os
dnf update -y
Add Prometheus Repositories
  • Enable Epel Repo
sudo dnf -y install epel-release vim
  • Add prometheus Repo
curl -s https://packagecloud.io/install/repositories/prometheus-rpm/release/script.rpm.sh | sudo bash
  • Install Prometheus
sudo dnf install prometheus -y
  • Once installed, Prometheus stores its data at /var/lib/prometheus and config files at /etc/prometheus/.

  • The default configuration file is at /etc/prometheus/prometheus.yml

  • Start and enable service using below commands:

sudo systemctl start prometheus
sudo systemctl enable prometheus
  • check the status of prometheus service
systemctl status prometheus
  • Allow the port through the firewall if firewall is in use
sudo firewall-cmd --add-port=9090/tcp --permanent
sudo firewall-cmd --reload
you can access Prometheus using the URL http://IP_Address:9090
  • Check all targets
  • Configuration files and other details

Install and Configure Node Exporter

  • Add prometheus Repo
curl -s https://packagecloud.io/install/repositories/prometheus-rpm/release/script.rpm.sh | sudo bash
  • Install node_exporter
sudo dnf -y install node_exporter
  • Start and enable the service after installation:
sudo systemctl enable --now node_exporter
  • Check node_exporter service
systemctl status node_exporter
  • Check the port used
sudo ss -aplnt | grep node
  • Add to firewall if firewall is running already
sudo firewall-cmd --add-port=9100/tcp --permanent
sudo firewall-cmd --reload
  • Now on Prometheus server make the entry of this newly added node
sudo vim /etc/prometheus/prometheus.yml
- job_name: 'node_exporter_metrics'
    scrape_interval: 5s
    static_configs:
      - targets: ['SERVER-IP:9100']
  • Restart Prometheus service
sudo systemctl restart prometheus

Install and Configure Blackbox exporter

The Blackbox exporter enables probing of endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. You can install from YUM repository or manually.

  • Install Blackbox exporter from Prometheus repository
sudo dnf -y install blackbox_exporter

The settings of Blackbox exporter are in the file /etc/prometheus/blackbox.yml. You can modify or just start and enable the service:

  • Start and enable node-exporter service
sudo systemctl enable --now blackbox_exporter
  • Enable firewall if running
sudo firewall-cmd --add-port=9115/tcp --permanent
sudo firewall-cmd --reload
  • How to check prometheus configuration file
 promtool  check config /etc/prometheus/prometheus.yml
  • Make an entry in Prometheus.yaml file
  - job_name: 'Blackbox_ssh'
   metrics_path: /probe
   params:
     module: [ssh_banner]
   static_configs:
     - targets:
       - 172.16.16.120:26
   relabel_configs:
     - source_labels: [__address__]
       target_label: __param_target
     - source_labels: [__param_target]
       target_label: instance
     - target_label: __address__
       replacement: 172.16.16.120:9115
  • Restart prometheus service
systemct restart prometheus

Configuring Alert Manager

Alertmanager is the application that handles the alerts sent by the applications and notifies the user via E-mail, Slack, or other tools. Alert rules defined in Prometheus are taken into consideration when scraping metrics. If any of the alert conditions are hit depending on the rules, Prometheus pushes them to the AlertManager.

The Alertmanager can handle grouping, deduplication, and routing of alerts to the correct receiver. It manages alerts through its pipelines, which are:

  • Silencing: mutes alerts for a given period
  • Grouping: groups alerts of similar nature into a single notification to avoid sending multiple notifications.
  • Inhibition: suppresses specific alerts if other alerts are already fired.
  • Install Alert Manager
 dnf install alertmanager -y
  • Alert Manager config file
/etc/alertmanager/alertmanager.yml
/etc/prometheus/alertmanager.yml
  • repeat_interval tells the AlertManager to wait for the set time before sending another notification. The default value is 1 hour, but you can adjust it as desired.

  • receiver: ’email’ sets the default receiver to be used. For this tutorial, we have set the default receiver as email.

  • receivers: lists the available receivers with their configurations. for example web.hook and email as above

  • Modify AlertManager config

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email'
receivers:
  - name: 'email'
    email_configs:
    - to: 'Test@gmail.com'
      from: 'test@gmail.com'
      smarthost: smtp.gmail.com:587
      auth_username: 'test@gmail.com'
      auth_password: 'pzckclkrhpfnscts'
      send_resolved: true
  • Check alert manager configuration
amtool check-config /etc/alertmanager/alertmanager.yml
amtool check-config /etc/prometheus/alertmanager.yml
  • Modify the pormetheus configuration settings
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
           - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "alert_rules.yml"
  # - "second_rules.yml"
  • Create a file
vi /etc/prometheus/alert_rules.yml
groups:
- name: alert_rules
  rules:
   - alert: InstanceDown
     expr: up == 1
     for: 1m

Promql (WIP)

Promqul: Basics

Prometheus Metric and Data Types

Prometheus has four metric types:

  • Counters
  • Gauges
  • Histograms
  • Summaries
https://logz.io/blog/promql-examples-introduction/
  • Matcher and Selector: Filters are used with {“job”=“prometheus”} {“job”=“prometheus”,“node”=“node1”}

Equality Mactcher = Negative Equality Matcher {“job”!=“node2”} Regualar Expression: {=~} prometheus_http_request_total{handler=~"/api.*"} Negative Regular Expression Matcher {!~}

Range Vector: [1d] [1m]

Binary Operator: Arithmatic

Aggregator:

  • Find the number of pods per namespace
sum by (namespace) (kube_pod_info)
  • Find CPU overcommit

sum(kube_pod_container_resource_limits{resource="cpu"}) - sum(kube_node_status_capacity{resource="cpu"})
  • Memory overcommit

sum(kube_pod_container_resource_limits{resource="memory"}) - sum(kube_node_status_capacity{resource="memory"})
  • Find unhealthy Kubernetes pods

min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0
  • Find Kubernetes pods CrashLooping

increase(kube_pod_container_status_restarts_total[15m]) > 3
  • Find the number of containers without CPU limits in each namespace

count by (namespace)(sum by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))
  • Find PersistentVolumeClaim in the pending state

kube_persistentvolumeclaim_status_phase{phase="Pending"}
  • Find unstable nodes

sum(changes(kube_node_status_condition{status="true",condition="Ready"}[15m])) by (node) > 2
  • Find idle CPU cores

sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 >0)
  • Find idle memory

sum((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 >0 ) / (1024*1024*1024)
  • Find node status

sum(kube_node_status_condition{condition="Ready",status="true"})
sum(kube_node_status_condition{condition="NotReady",status="true"})
sum(kube_node_spec_unschedulable) by (node)