Air Gapped Kubernetes Cluster Provisioning with Cluster API for vSphere

Page content


There are many different ways to deploy a Kubernetes cluster. This post is not to introduce a new methodology for deploying new clusters, but to address a common problem that we see in the field. Whether you are working in the defense industry, financial, telecommunications, or some other industry, they all have one thing in common. That is they all have a need to protect their data. One common way to do that is to keep systems isolated from the public internet either via a special network that sits behind an enterprise firewall, or by total isolation without the ability to have any internet connectivity whatsoever. This makes for challenges in managing those environments as we are in a mindset of total connectivity these days. Specifically, as it relates to Kubernetes, deployments will need the ability to pull images and manifests from the internet to accomplish such a feat.

Additionally, as stated above, there are many different tools and approach to deploying Kubernetes clusters. A new approach that has caught my eye is the ability to manage the lifecycle of those clusters with a common deployment tool called Cluster API. It does present challenges when using this tool to deploy the cluster without the ability, or limited ability to reach out to the public internet. For the purposes of this post, I am assuming, since we are completely isolated, that this environment is a self-managed vSphere environment. Although public cloud providers do exist for air-gapped worlds, I feel the most common use case for this will involve VMware. VMware has a provider for Cluster API documented here.


This post assumes the following:

  • Basic Linux skillset
  • Basic VMware skillset
  • Basic Kubernetes skillset
  • Pre-Existing image registry in the disconnected/air-gapped environment
  • Pre-Existing “bootstrap” system with Docker running
  • Pre-Existing vSphere environment (6.7u3 or later)
  • A single internet-connected system to pre-stage artifacts
  • A file system with 5G of free space where the pre-stage artifacts will be placed on both connected/disconnected systems
  • Approved process to move artifacts from publically connected systems to the target air-gapped environment

Getting Started

Connected System Requirements

Before we do anything, we have to assume that there is some system, somewhere, that we can use to procure our artifacts prior to their move into our disconnected environment. Obviously, we can’t use osmosis to accomplish this. :) This system doesn’t have to be able to access our installation environment, rather it should simply be thought of as a staging area to download artifacts and collect information about the installation process in the installation environment. The following prerequisite requirements must be met on the system that is internet connected:

  1. Google Storage Utility: this utility allows us to access the Google Storage API so that we can see and download the appropriate Kubernetes OVA templates for VMware. Please visit this link for installation instructions.
  2. Docker: installing Docker onto our connected system allows us an easy way to move images from our connected system into our air-gapped environment. Please visit this link for installation instructions.

Gather Information

Additionally, we must note several items (e.g. software versions) prior to beginning the download process. The following prerequisite steps must be completed, and noted, following artifact download:

  1. Which Version of Kubernetes: we will first need to know which version of Kubernetes to deploy. Organizational constraints may restrict us to specific version(s). The specific versions can be found at

NOTE This example moving forward will utilize v1.17.4 as an example.

  1. Physical Location for Artifacts: the Cluster API installer makes use of Docker images, OVAs, and resource definition files in order to install the cluster. If we are to install a cluster in a disconnected manner, we must manually move these artifacts from our connected system into our disconnected environment. We’ll need to note the physical location of where to download these images in our connected environment, and where we will upload them to in our disconnected environment.

NOTE Please see Download Artifacts for the expected directory structure/location in this example.

  1. HTTP Location of HAProxy/Node OVAs: the image-builder project produces several OVA images which meet all of the necessary requirements for Cluster API to produce a functional working Kubernetes cluster. It is highly recommended to use these images. You will need 2 images:
    1. HAProxy OVA: this is the load balancer that is used for kube-apiserver
    2. Node: the actual image used to produce Kubernetes nodes

To get a list of images, you can use the gsutil command which should have been installed in the Connected System Requirements section.

# set our default kubernetes version
export KUBERNETES_VERSION='v1.17.4'

# list the approved images for kubernetes nodes
gsutil ls gs://capv-images/release/*/*.{ova,sha256} | sed 's~^gs://~' | grep ${KUBERNETES_VERSION}

# list the approved images for haproxy nodes
gsutil ls gs://capv-images/extra/haproxy/release/*/*.{ova,sha256} | sed 's~^gs://~'

Be aware that the Kubernetes version is a part of the URL and file name. You should have determined which Kubernetes version to use in the previous step and be using it as the KUBERNETES_VERSION environment variable when running the gsutil command.

Download Artifacts

Once we can determine the location of the artifacts, both the internet location and the physical location of our connected system, we can download them. But first, we should probably create a structure in which to download our artifacts to that it is easily transportable to our air-gapped installation environment.

Each artifact being downloaded below assumes the following directory structure:

├── resources      # folder for storing resource definition files
├── docker-images  # folder for storing docker images
├── binaries       # folder for storing installation binaries
└── ovas           # folder for storing vmware ova templates

If you want to follow along, this directory structure can be created easily with the following command (run from the directory in which you wish to produce the skeleton structure):

mkdir ovas docker-images resources binaries

OVA Template for HAProxy

Download Directory: ./ovas

Download the appropriate HA Proxy OVA image as determined from the previous step:

# download the ova to a haproxy.ova file to the current working directory
curl -L -o ./ovas/haproxy.ova

OVA Template for Kubernetes Nodes

Download Directory: ./ovas

Download the appropriate Kubernetes node OVA image as determined from the previous step:

# set our default kubernetes version
export KUBERNETES_VERSION='v1.17.4'

# download the ova to a node.ova file to the current working directory
curl -L${KUBERNETES_VERSION}.ova -o ./ovas/node.ova

Latest Version of Cluster API

Download Directory: ./binaries

Cluster API ships with the clusterctl utility. You will need this in order to interact with your cluster deployment. Download instructions can be found here. Please also be aware of the release version of Cluster API, which are located here. For quick reference, to download the utility (for MacOS):

# set the clusterapi version

# download clusterctl
curl -L${CLUSTERAPI_VERSION}/clusterctl-darwin-amd64 -o ./binaries/clusterctl

Cluster API Resource Definitions

Download Directory: ./resources/clusterapi

Cluster API makes use of several custom resource definitions to both deploy and manage the lifecycle of Kubernetes clusters. We will need to download these and potentially leverage them during our deployment.

NOTE: currently, the below command points to invalid locations as noted in issue You will need to manually download the CRD files using the appropriate URLs which can be loosely translated as follows:


You can download via the following commands:

# display the custom resource definitions (valid as of clusterapi v0.3.3)
clusterctl config repositories
NAME          TYPE                     URL
cluster-api   CoreProvider   
kubeadm       BootstrapProvider
kubeadm       ControlPlaneProvider
aws           InfrastructureProvider
azure         InfrastructureProvider
metal3        InfrastructureProvider
openstack     InfrastructureProvider
vsphere       InfrastructureProvider

# download the crds
# NOTE: this must be done manually for now as per the above bolded note.  for reference:
mkdir -p ./resources/clusterapi
cd ./resources/clusterapi
curl -L -O
curl -L -O
curl -L -O
curl -L -O
curl -L -O


Download Directory: ./binaries (for installation binary)
Download Directory: ./docker-images (for Docker image)

Cluster API makes use of the concept that we will first install a cluster which is used to bootstrap other clusters. This concept is documented in The Cluster API Book. To do this, an often-used tactic is to make use of a local KIND (Kubernetes in Docker) cluster. This allows a quick spinup of a local management cluster to bootstrap other clusters.

To download the kind binary, run the following commands:

# download the kind binary
curl -L -o ./binaries/kind "$(uname)-amd64"

To download the kind Docker image, run the following commands:

NOTE Not all versions of Kubernetes are produced for KIND. This version can mismatch the Kubernetes version you are wanting to install, as the KIND cluster is only used for bootstrapping the actual workload cluster. Versions of KIND are listed here.

# use docker to download the kind image
docker pull kindest/node:v1.17.2

# save the docker kind image as a tar archive
mkdir -p ./docker-images/kind
docker save kindest/node:v1.17.2 > ./docker-images/kind/kind-image.tar

Networking Plugin Definitions

Download Directory: ./resources/cni

For any Kubernetes cluster to be active, it must use an appropriate CNI network plugin. This is not to debate which is the appropriate one for your environment, as there is plenty of opinions already out there. This is just to note that you will need to have the one you decide upon available.

For this walkthrough, we will use Calico:

mkdir -p ./resources/cni
curl -L -o ./resources/cni/calico.yaml

vSphere Storage Resource Definitions

Download Directory: ./resources/cpi (for vSphere Cloud Provider Interface components)
Download Directory: ./resources/csi (for vSphere Cloud Storage Interface components)

We will need to ensure that we download the vSphere storage driver if we want to make use of the native storage integrations with vSphere. Although it’s not a prerequisite for a functional Kubernetes cluster, it’s nice to have while we are downloading and moving stuff from connected to disconnected environments, as this is often times a slow and painful process.

A full list of instructions can be found at

To download the appropriate vSphere Cloud Provider definitons:

mkdir -p ./resources/cpi
cd ./resources/cpi
curl -L -O
curl -L -O
curl -L -O

To download the appropriate vSphere Storage Provider definitons:

mkdir -p ./resources/csi
cd ./resources/csi
curl -L -O
curl -L -O
curl -L -O

Download Cert Manager Resource Definitions

Download Directory: ./resources/cert-manager

Cert Manager is an upstream project that allows Kubernetes users the ability to manage certificates within their Kubernetes cluster. It is leveraged by Cluster API. You can get more information on the project here.

NOTE It appears that never versions do not work with Cluster API. I have pinned version v0.11.0 for now.

To download the resource definition:

mkdir -p ./resources/cert-manager
cd ./resources/cert-manager
curl -L -O

Download the Docker Images for Kubernetes Components

NOTE It should be noted that several Kubernetes specific images are cached locally on the OVA images and will still reference the upstream images. They were simply pulled and placed in the registry to be thorough.

Download Directory: ./docker-images

There isn’t a super clean way to do this. However, the kubeadm tool does allow us to list the images that are needed for a Kubernetes base installation. We will need those components in our disconnected environment.

First, start a KIND cluster on our connected system. You will note that we have already downloaded the kind binary and image in this step and we will leverage the fact that we are assuming Docker is installed on our connected system to spin up a quick cluster and list images that we need:

chmod +x ./binaries/kind
./binaries/kind create cluster --name image-list-cluster

Next, we can list the images for the specific version of Kubernetes that we wish to install in the disconnected environment:

# display the running kind cluster and note the container id
docker ps
# CONTAINER ID        IMAGE                                             COMMAND                  CREATED             STATUS              PORTS                       NAMES
# 5b3b07b81124   "/usr/local/bin/entr…"   16 minutes ago      Up 15 minutes>6443/tcp   image-list-cluster

# list the images and store them as a variable
IMAGES=$(docker exec 5b3b07b81124 kubeadm config images list --kubernetes-version ${KUBERNETES_VERSION} 2> /dev/null)

Finally, we can download the images that we need need using the IMAGES variable from the previous step:

mkdir -p ./docker-images/kubernetes
for IMAGE in ${IMAGES}; do
  # create some useful variables
  COMPONENT=$(echo ${IMAGE} | awk -F'/' '{print $NF}')
  COMPONENT_NAME=$(echo ${COMPONENT} | awk -F':' '{print $1}')

  # pull the image
  docker pull ${IMAGE}

  # save the image as a tar file
  docker save ${IMAGE} > ./docker-images/kubernetes/${COMPONENT_NAME}.tar

Download the Docker Images for Other Components

Download Directory: ./docker-images

This is the part that gets a bit tricky. We have already downloaded the kind image and we had to do that as a one-off as kind is not leveraged by the Cluster API resource definition files, as it expects a pre-existing cluster (which is what kind is used for). However, now we must download all of our images which are referenced in our downloaded definition files. Cluster API will use the image: key to define where images are located. We must find every reference to the image: key and download the value of that key.

DISCLAIMER This works in a vacuum, but more testing should be taken when downloading these images in an actual production environment.

for TYPE in $(ls resources/); do
  for IMAGE in $(grep image: resources/${TYPE}/* -r | awk '{print $NF}' | tr -d '"' | sort -u); do 
    # create some useful variables
    COMPONENT=$(echo ${IMAGE} | awk -F'/' '{print $NF}')
    COMPONENT_NAME=$(echo ${COMPONENT} | awk -F':' '{print $1}')
    COMPONENT_VERS=$(echo ${COMPONENT} | awk -F':' '{print $NF}')

    # pull the image from the url
    docker pull ${IMAGE}

    # save the image as a tar file
    if [[ ! -d ${IMAGE_DIRECTORY} ]]; then mkdir -p ${IMAGE_DIRECTORY}; fi
    docker save ${IMAGE} > ${IMAGE_DIRECTORY}/${COMPONENT_NAME}.tar

Creating the Tarball

Once everything has been downloaded, you should have a directory structure which looks similar to the following:

├── binaries
│   ├── clusterctl
│   └── kind
├── docker-images
│   ├── cert-manager
│   │   ├── cert-manager-cainjector.tar
│   │   ├── cert-manager-controller.tar
│   │   └── cert-manager-webhook.tar
│   ├── clusterapi
│   │   ├── cluster-api-controller.tar
│   │   ├── kube-rbac-proxy.tar
│   │   ├── kubeadm-bootstrap-controller.tar
│   │   ├── kubeadm-control-plane-controller.tar
│   │   └── manager.tar
│   ├── cni
│   │   ├── cni.tar
│   │   ├── kube-controllers.tar
│   │   ├── node.tar
│   │   └── pod2daemon-flexvol.tar
│   ├── cpi
│   │   └── manager.tar
│   ├── csi
│   │   ├── csi-attacher.tar
│   │   ├── csi-node-driver-registrar.tar
│   │   ├── csi-provisioner.tar
│   │   ├── driver.tar
│   │   ├── livenessprobe.tar
│   │   └── syncer.tar
│   ├── kind
│   │   └── kind.tar
│   └── kubernetes
│       ├── coredns.tar
│       ├── etcd.tar
│       ├── kube-apiserver.tar
│       ├── kube-controller-manager.tar
│       ├── kube-proxy.tar
│       ├── kube-scheduler.tar
│       └── pause.tar
├── kube-cluster-api.tar.gz
├── ovas
│   ├── haproxy.ova
│   └── node.ova
└── resources
    ├── cert-manager
    │   └── cert-manager.yaml
    ├── clusterapi
    │   ├── bootstrap-components.yaml
    │   ├── cluster-api-components.yaml
    │   ├── control-plane-components.yaml
    │   ├── core-components.yaml
    │   └── infrastructure-components.yaml
    ├── cni
    │   └── calico.yaml
    ├── cpi
    │   ├── cloud-controller-manager-role-bindings.yaml
    │   ├── cloud-controller-manager-roles.yaml
    │   └── vsphere-cloud-controller-manager-ds.yaml
    └── csi
        ├── vsphere-csi-controller-rbac.yaml
        ├── vsphere-csi-controller-ss.yaml
        └── vsphere-csi-node-ds.yaml

16 directories, 45 files

Once you are happy with what you have, you can tar the directory up and prepare it to be moved onto your airgapped network:

tar czvf kube-cluster-api.tar.gz ./*

Migrate Artifacts to Air Gapped Environment

Most organizations which utilized disconnected environments have processes in place to move artifacts from an environment which is internet facing to environments which have little to no internet connectivity. It will be important to follow those procedures for your organization as they vary between organizations.

Once the process has been followed and you have the artifact on the air-gapped network we can begin the process of installing the Kubernetes cluster using Cluster API.

Unpack the Tarball on our Bootstrap System

In a previous step we created a file kube-cluster-api.tar.gz which is a self-contained artifact with all of the necessary items needed to install a Kubernetes cluster with Cluster API. We will first need to identify our bootstrap system and move the kube-cluster-api.tar.gz file in order to prepare installation. It is assumed that the bootstrap system has access to the image registry, access to the vCenter server and the user has credentials to authenticate with both vCenter/Image registry. Additionally, it is assumed that Docker is running on the bootstrap server.

Once the kube-cluster-api.tar.gz file is on the bootstrap system, we can unpack it:

tar zxvf kube-cluster-api.tar.gz

Upload OVA Files to vCenter

The OVA images for both the Kubernetes Nodes and HAProxy will need to be uploaded into vCenter. Please follow for guidance on how to do so. The article uses the govc utility which can be used, but it’s just as easy to manually upload. Additionally, I have created an Ansible project to automatically set this up if you are an Ansible user and wish to use this tool. It is located at

Once the OVAs are uploaded, take note of the names as you will need them to configure Cluster API later.

Load the Docker Images on our Bootstrap System and Private Registry

First, we will need to import the docker images into our bootstrap system:

NOTE Prior to this step, we need to make note of our private image registry and we will need to likely login to authenticate with it in order to push images to it.

DISCLAIMER This works in a vacuum, but more testing should be taken when downloading these images in an actual production environment.


# login with the disconnected registry
docker login ${REGISTRY}

# upload the images to the registry
for TYPE in $(ls ./docker-images/); do
  for IMAGE in $(find ./docker-images/${TYPE} -type f); do
    # load the image into the docker cache and store the output as a variable
    LOADED_IMAGE=$(docker load < ${IMAGE} | awk '{print $NF}')

    # create some useful variables
    COMPONENT=$(echo ${LOADED_IMAGE} | awk -F'/' '{print $NF}')
    COMPONENT_NAME=$(echo ${COMPONENT} | awk -F':' '{print $1}')
    COMPONENT_VERS=$(echo ${COMPONENT} | awk -F':' '{print $NF}')

    # tag the image with our private registry information
    docker tag ${LOADED_IMAGE} ${PRIVATE_IMAGE}

    # push the image to the private registry
    docker push ${PRIVATE_IMAGE}
    docker push ${PRIVATE_IMAGE_LATEST}

Configure the Bootstrap Cluster

Start the Bootstrap KIND Cluster

We will next need to start our bootstrap KIND cluster. Remember, this cluster is simply a local Kubernetes cluster which is responsible for bootstrapping our environment in vSphere.

First, we will need to create a configuration so that containerd my pull images from our private image registry. There are multiple ways to do this as documented here or here. However, I landed on what seemed to be the easiest way for me to grasp what was happening. I created a kind-config.yaml file with the registry information:

kind: Cluster
- |-
      insecure_skip_verify = true
- |-
    endpoint = [""]

Next, we need to copy our kind executable from our tarball to our $PATH.

chmod +x ./binaries/kind
cp ./binaries/kind /usr/local/bin

Then we can start the kind cluster from our private registry using our kind-config.yaml from the previous step:

kind create cluster --name vsphere-bootstrap-cluster --image --config kind-config.yaml

Update Image References to Use Our Private Registry

For each image: line occurence in our resources files, we will need to update them to reference the images in our private registry. This will be heavily dependent upon the structure that we’ve created thus far. If you’ve followed along, you should be good, however do note if you’ve come up with your own methodlogy, you should substitute image names/locations as appropriate.

NOTE This can be accomplished with some sed magic, however I am both awful and dangerous with sed, so I will do you all a favor and not share the awfulness that I did to accomplish this.

NOTE Rather than performning this step, we MAY be able to use the new image overrides feature in clusterctl, which would be much cleaner/easier, however I have not explored this avenue yet. I am happy to update this post when I figure it out.

The following is simply the motions that we need to go through to update the image repository. This needs to happen for every occurence of image::

# find all image: occurences in our resources files
find ./resources -type f | xargs -I {} grep 'image:' {}

# change the image: line to point to our private repository
# OLD: image:
# NEW: image:

Configure Cluster API

One of our final steps in the process before we can provision our cluster is to configure Cluster API. First we must move our executable from our tarball into our $PATH:

chmod +x ./binaries/clusterctl
cp ./binaries/clusterctl /usr/local/bin

Then we must create a Cluster API configuration file at ~/.cluster-api/clusterctl.yaml:


# settings for offline installations
    tag: v0.11.0

We will also need to create an overrides folder to override the default Cluster API definitions. Read more about overrides here. The structure should be as follows:

tree ~/.cluster-api/overrides/
├── bootstrap-kubeadm
│   └── v0.3.3
│       └── bootstrap-components.yaml
├── cluster-api
│   └── v0.3.3
│       └── core-components.yaml
├── control-plane-kubeadm
│   └── v0.3.3
│       └── control-plane-components.yaml
└── infrastructure-vsphere
    └── v0.6.3
        └── infrastructure-components.yaml

Then we can copy our resource definitions from the .resources folder of our tarball, to the locations of our providers as listed above. The paths are mandated by the clusterctl binary, so you’ll have to place them appropriately as defined above. Absolute paths are also mandatory. For example:

mkdir -p /Users/sdustion/.cluster-api/overrides/cluster-api/v0.3.3
cp ./resources/clusterapi/core-components.yaml /Users/sdustion/.cluster-api/overrides/cluster-api/v0.3.3

We can verify that our images are pointed to our expected repository via the following command:

clusterctl init --list-images

Finally, we will need to ensure that the Kubernetes cluster in our vSphre environment that we intend to deploy will accept our private registry. To do so, we ust edit the ~/.cluster-api/overrides/infrastructure-vsphere/v0.6.3/infrastructure-components.yaml and insert the following:

Initialize the Workload Cluster

We have one final step remaining. Most of the instructions at are now applicable. However, we need to let the workload cluster know that we will be using a private registry as well. To do so we can create a manifest with the following command:

clusterctl config cluster vsphere-quickstart \
    --infrastructure vsphere \
    --kubernetes-version v1.17.3 \
    --control-plane-machine-count 1 \
    --worker-machine-count 3 > cluster.yaml

Note the cluster.yaml file that was output in the previous command. We will need to add the following block in that file to tell cloud-init to create a containerd configuration file which allows us to use the private container registry:

    - path: /etc/containerd/config.toml
      content: |
        version = 2

              default_runtime_name = "runc"
                  runtime_type = "io.containerd.runc.v2"
                  runtime_type = "io.containerd.runc.v2"
                    insecure_skip_verify = true
                  endpoint = [""]

The above block will need to be created as part of the KubeadmControlPlane.spec.kubeadmConfigSpec.files key and the KubeadmConfigTemmplate.spec.template.spec.files key. An example configuration looks as follows:

kind: KubeadmConfigTemplate
  name: '${ CLUSTER_NAME }-md-0'
  namespace: '${ NAMESPACE }'
          criSocket: /var/run/containerd/containerd.sock
            cloud-provider: external
          name: '{{ ds.meta_data.hostname }}'
      - hostname "{{ ds.meta_data.hostname }}"
      - echo "::1         ipv6-localhost ipv6-loopback" >/etc/hosts
      - echo "   localhost" >>/etc/hosts
      - echo "   {{ ds.meta_data.hostname }}" >>/etc/hosts
      - echo "{{ ds.meta_data.hostname }}" >/etc/hostname
      - path: /etc/containerd/config.toml
        content: |
          version = 2

                default_runtime_name = "runc"
                    runtime_type = "io.containerd.runc.v2"
                    runtime_type = "io.containerd.runc.v2"
                      insecure_skip_verify = true
                    endpoint = [""]
      - name: capv
        sudo: ALL=(ALL) NOPASSWD:ALL

As one last hoorah, we will need to update the following in your cluster.yaml file to reference the images in your private repository:


Finally, we can now initialize the workload cluster with the following command:

kubectl apply -f cluster.yaml

And apply the CNI manifest using our local resource file:

kubectl apply -f ./resources/cni/calico.yaml

NOTE It should be noted that several Kubernetes specific images are cached locally on the OVA images and will still reference the upstream images. They were simply pulled and placed in the registry to be thorough.


There are many considerations to take into account when deploying an offline cluster. It is a necessary evil that will continue to be a valid use case for deploying Kubernetes clusters in the field. Hope you found this helpful. As always, there is an infinite number of ways to do things. Happy to update if you find a better way!