Openshift Installation on Vmware

Comprehensive OpenShift guides and documentation.

Description

This document describes the installation of OCP4 on the VMWare infra hosted on premise.

RBAC required to manage the installation

Be sure to be able to access to :

Informations about DNS

The openshift will be reachable from Internet and from the LAN. It will use a public DNS record.
A subzone of local.com. named innershift.local.com. has been delegated to our team and is managed in an Azure DNS zone to give some freedom and responsivity when we need to manage it.

Issues known

During the installation of the production env, we were hit by the following bug : https://bugzilla.redhat.com/show_bug.cgi?id=1882022 .
To figure out we applied the following solution: https://access.redhat.com/solutions/5507161

Informations required by the installer for the PRODUCTION

key value
basedomain local.com
metadata.name ocp4
compute.hyperthreading Enabled
compute.name worker
compute.replicas 3
compute.platform.vsphere.cpus 8
compute.platform.vsphere.coresPerSocket 1
compute.platform.vsphere.memoryMB 27648
compute.platform.vsphere.osDisk.diskSizeGB 120
controlPlane.hyperthreading Enabled
controlPlane.name master
controlPlane.replicas 3
controlPlane.platform.vsphere.cpus 16
controlPlane.platform.vsphere.coresPerSocket 1
controlPlane.platform.vsphere.memoryMB 98304
controlPlane.platform.vsphere.osDisk.diskSizeGB 120
platform.vsphere.vcenter vcenter01.local.com
platform.vsphere.username local.com\administrator
platform.vsphere.password find it in the azure key vault used for Openshift
platform.vsphere.datacenter
platform.vsphere.folder
platform.vsphere.defaultDatastore ocp
platform.vsphere.network ocp
platform.vsphere.cluster ocp
platform.vsphere.api_vip 192.168.1.100
platform.vsphere.ingress_vip 192.168.1.101
fips false
pullSecret find it by going on https://cloud.redhat.com/openshift/create, navigate to the “DataCenter” tab, select “vsphere”, and “copy your pull secret
sshKey ''
networking.clusterNetwork.cidr 172.20.0.0/14
networking.clusterNetwork.hostPrefix 23
networking.clusterNetwork.cidr 172.20.0.0/14
networking.clusterNetwork.hostPrefix 23
networking.serviceNetwork 172.19.0.0/16

Example of install-config.yaml

apiVersion: v1
baseDomain: local.com
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 3
  platform:
    vsphere:
      cpus: 8
      coresPerSocket: 1
      memoryMB: 27648
      osDisk:
        diskSizeGB: 120
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
  platform:
    vsphere:
      cpus: 16
      coresPerSocket: 1
      memoryMB: 98304
      osDisk:
        diskSizeGB: 120
metadata:
  name: ocp4
platform:
  vsphere:
    vcenter: vcenter01.local.com
    username: 
    password: XXXXXXXXXXXXXXXXXXXXXXX
    datacenter:
    folder: 
    defaultDatastore: 
    network: 
    cluster: 
    apiVIP: 
    ingressVIP: 
fips: false
pullSecret: '{"auths":XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}}}'
sshKey: 
networking:
  clusterNetwork:
    - cidr: 172.20.0.0/14
      hostPrefix: 23
  serviceNetwork:
    - 172.19.0.0/16

The VCenter certificate must be trusted (once !):

curl -kSL https://vcenter01.local.com/certs/download.zip -o download.zip
unzip download.zip
sudo cp certs/lin/* /etc/pki/ca-trust/source/anchors
sudo update-ca-trust extract

Prerequisites

  • have trusted the certificates on the bastion
  • create an install-config.yaml file respecting the values described above
  • In the VCenter, manually create the Folder that will be used by the installer.
    For example, for creating the production env folder, go on the VMs and templates
  • have reserved in IPAM the 2 static IPs that will be used by the ingress and the API
  • create the matching Resource Records in the Azure DNS zone

Create advanced network configuration

  • Connect to the bastion
 ssh root@<bastion>
  • Create a working directory
mkdir -p vsphere/$(date -I)
cd vsphere
  • copy the install-config.yaml file you created in the working directory
cp install-config.yaml $(date -I)/
tolerations:
        - effect: NoSchedule 
          key: node-role.kubernetes.io/infra 
          operator: Exists
vsphere $ wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.6.18/openshift-install-linux-4.6.18.tar.gz
vsphere $ tar xzvf openshift-install-linux-4.6.18.tar.gz
  • create the manifests file:
./openshift-install create manifests --dir=$(date -I)/
  • Create a file that is named cluster-network-03-config.yml
touch $(date -I)/manifests/cluster-network-03-config.yml 
  • Add the following content in this file
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 172.20.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.19.0.0/16
  defaultNetwork:
    type: OVNKubernetes
    ovnKubernetesConfig:
      mtu: 1400
      genevePort: 6081
  

run the installer

  • from the same working directory you created during the advanced network config step, run the openshift install tool
./openshift-install create cluster --dir=./$(date -I) --log-level=info

In production env it took around 40 minutes to be run:

INFO Creating infrastructure resources...
INFO Waiting up to 20m0s for the Kubernetes API at https://api.ocp4.local.com:6443...
INFO API v1.19.0+f173eb4 up
INFO Waiting up to 30m0s for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 40m0s for the cluster at https://api.ocp4.local.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/ocpadmin/vsphere/2021-03-25/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.local.com
INFO Login to the console with user: "kubeadmin", and password: "XXXXXXXXXXXX"
INFO Time elapsed: 38m24s

install the oc client on the bastion (once)

  • download and install oc
curl https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.6.18/openshift-client-linux-4.6.18.tar.gz -o openshift-client-linux-4.6.18.tar.gz
tar xzvf openshift-client-linux-4.6.18.tar.gz
sudo cp oc /usr/local/bin/
sudo cp kubectl /usr/local/bin/
  • manage the auto completion
sudo yum install -y bash-completion
oc completion bash > ~/.kube/completion.bash.inc
printf "
# Kubectl shell completion
source '$HOME/.kube/completion.bash.inc'
" >> $HOME/.bash_profile
source $HOME/.bash_profile

Registry installation

Known Issues

During the settings of the bought Certificates on the NetApp SVM S3, we noticed, even if we fully set it up on ONTAP, the full chain is never returned to the client.
It means the certificates is considered as misconfigured (i.e. understand it is not the same error as the “self signed” once).
Waiting for a NetApp fix we decided to deploy a simple block registry storage.

Registry installation

  • temporarily decrease the registry replicas to 1
oc patch config.imageregistry.operator.openshift.io/cluster --type=merge -p '{"spec":{"rolloutStrategy":"Recreate","replicas":1}}'
  • define a PVC of 500Gi in a ‘pvc.yaml’ file
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: image-registry-storage 
spec:
  accessModes:
  - ReadWriteOnce 
  resources:
    requests:
      storage: 500Gi 
  • create it in the right namespace
oc create -f pvc.yaml -n openshift-image-registry  
  • edit the registry configuration
 oc edit config.imageregistry.operator.openshift.io -o yaml
  • and modify the spec.storage configuration to match the following settings:
  storage:
   pvc:
     claim: image-registry-storage 
  • Wait for the PV has been successfully created and make the registry managed
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed"}}'
  • Wait for the registry becomes ready
watch oc get po -l docker-registry=default 
  • Check the cluster operator became ready in a non-degraded state
 oc get co image-registry

Post day 1 tests

  • test the registry

Now the registry is deployed, you can try to create a namespace and deploy a workload.
For example, in the developer view, add a jenkins with a persistence storage and check everything is properly created.

  • ensure NTP is working fine on worker nodes

External references