Category: vSphere & K8s

Preparing vSphere for Kubernetes Persistent Storage (3/3): MinIO S3 Object Storage using vSAN SNA

December 30, 2022 / jhasensio

Defining vSAN SNA (Shared-Nothing Architecture)
Installing MinIO Operator using Krew
Bonus: All together now. It’s time for automation

Modern Kubernetes-based applications are built with the idea of being able to offer features such as availability, scalability and replication natively and agnosticly to the insfrasturure they are running on. This philosophy questions the need to duplicate those same functions from the infrastructure, especially at the storage layer. As an example, we could set up a modern storage application like MinIO to provide an S3 object storage platform with its own protection mechanisms (e.g., erasure code) using vSAN as underlay storage infrastructure that in turn have some embedded data redundancy and availability mechanisms.

The good news is that when using vSAN, we can selectively choose which features we want to enable in our storage layer through Storage Policy-Based Management (SPBM). A special case is the so-called vSAN SNA (Shared-Nothing Architecture) that basically consists of disabling any redundancy in the vSAN-based volumes and rely on the application itself (e.g. MinIO) to meet any data redundany and fault tolerance requirements.

In this post we will go through the setup of MinIO over the top of vSAN using a Shared-Nothing Architecture.

Defining vSAN SNA (Shared-Nothing Architecture)

As mentioned, VMware vSAN provides a lot of granularity to tailor the storage service through SPBM policies. These policies can control where tha data is going to be physically placed, how the data is going to be protected in case a failure occurs, what is the performance in term of IOPS and so on in order to guarantee the required level of service to meet your use case requirements.

The setup of the vSAN cluster is out of scope of this walk through so, assuming vSAN cluster is configured and working properly, open the vCenter GUI and go to Policies and Profiles > VM Storage Policies and click on CREATE to start the definition of a new Storage Policy.

Pick up a significative name. I am using here vSAN-SNA MINIO as you can see below

Now select the Enable rules for “vSAN” storage that is exactly what we are trying to configure.

Next, configure the availability options. Ensure you select “No data redundancy with host affinity” here that is the main configuration setting if you want to create a SNA architecture in which all the data protection and availabilty will rely on the upper storage application (MinIO in this case).

Select the Storage Rules as per your preferences. Remember we are trying to avoid overlapping features to gain in performance and space efficiency, so ensure you are not duplication any task that MinIO is able to provide including data encryption.

Next select the Object space reservation and other settings. I am using here Thin provisioning and default values for the rest of the settings.

Select any compatible vsanDatastore that you have in your vSphere infrastructure.

After finishing the creation of the Storage Policy, it is time to define a StorageClass attached to the created vSAN policy. This configuration assumes that you have your kubernetes cluster already integrated with vSphere CNS services using the CSI Driver. If this is not the case you can follow this previous post before proceeding. Create a yaml file with following content.

vi vsphere-sc-sna.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: vsphere-sc-sna
provisioner: csi.vsphere.vmware.com
parameters:
  datastoreurl: "ds:///vmfs/volumes/vsan:529c9fd4d68b174b-1af2d7a4b1b22457/"
  storagepolicyname: "vSAN-SNA MINIO"
# csi.storage.k8s.io/fstype: "ext4" #Optional Parameter

Once the yaml manifest is created simply apply it using kubectl.

kubectl apply -f vsphere-sc-sna.yaml

Now create a new PVC that uses the storageclass defined. Remember that in general, when using an storageClass there is no need to precreate the PVs because the CSI Driver will provision for you dinamically without any storage administrator preprovisioning.

vi pvc_sna.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vsphere-pvc-sna
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi
  storageClassName: vsphere-sc-sna

Capture the allocated name pvc-9946244c-8d99-467e-b444-966c392d3bfa for the new created PVC.

kubectl get pvc vsphere-pvc-sna

NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
vsphere-pvc-sna   Bound    pvc-9946244c-8d99-467e-b444-966c392d3bfa   500Mi      RWO            vsphere-sc-sna   14s

Access to Cluster > Monitor > vSAN Virtual Objects and filter out using the previously captured pvc name in order to view the physical placement that is being used.

The redundancy scheme here is RAID 0 with a single vmdk disk which means in practice that vSAN is not providing any level of protection or performance gaining which is exactly what we are trying to achieve in this particular Shared-Nothing Architecture.

Once the SNA architecture is defined we can proceed with MinIO Operator installation.

Installing MinIO Operator using Krew

MinIO is an object storage solution with S3 API support and is a popular alternative for providing an Amazon S3-like service in a private cloud environment. In this example we will install MinIO Operator that allows us to deploy and manage different MinIO Tenants in the same cluster. According to the kubernetes official documentation, a kubernetes operator is just a software extension to Kubernetes that make use of custom resources to manage applications and their components following kubernetes principles.

Firstly, we will install a MinIO plugin using the krew utility. Krew is a plugin manager that complements kubernetes CLI tooling and is very helpful for installing and updating plugins that act as add-ons over kubectl command .

To install krew just copy and paste the following command line. Basically that set of commands check the OS to pick the corresponding krew prebuilt binary and then downloads, extracts and finally installs krew utility.

(
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)

The binary will be installed in the $HOME/.krew/bin directory. Update your PATH environment variable to include this new directory. In my case I am using bash.

echo 'export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"' >> $HOME/.bashrc

Now make the changes effective by restarting your shell. The easiest way is using exec to reinitialize the existing session and force to read new values.

exec $SHELL

If everything is ok, we should be able to list the installed krew plugins. The only available plugin by default is the krew plugin itself.

kubectl krew list

PLUGIN    VERSION
krew      v0.4.3

Now we should be able to install the desired MinIO plugin just using kubectl krew install as shown below.

kubectl krew install minio

Updated the local copy of plugin index.
Installing plugin: minio
Installed plugin: minio
\
 | Use this plugin:
 |      kubectl minio
 | Documentation:
 |      https://github.com/minio/operator/tree/master/kubectl-minio
 | Caveats:
 | \
 |  | * For resources that are not in default namespace, currently you must
 |  |   specify -n/--namespace explicitly (the current namespace setting is not
 |  |   yet used).
 | /
/
WARNING: You installed plugin "minio" from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.

Once the plugin is installed we can initialize the MinIO operator just by using the kubectl minio init command as shown below.

kubectl minio init

namespace/minio-operator created
serviceaccount/minio-operator created
clusterrole.rbac.authorization.k8s.io/minio-operator-role created
clusterrolebinding.rbac.authorization.k8s.io/minio-operator-binding created
customresourcedefinition.apiextensions.k8s.io/tenants.minio.min.io created
service/operator created
deployment.apps/minio-operator created
serviceaccount/console-sa created
secret/console-sa-secret created
clusterrole.rbac.authorization.k8s.io/console-sa-role created
clusterrolebinding.rbac.authorization.k8s.io/console-sa-binding created
configmap/console-env created
service/console created
deployment.apps/console created
-----------------

To open Operator UI, start a port forward using this command:

kubectl minio proxy -n minio-operator

-----------------

The MinIO Operator installation has created several kubernetes resources within the minio-operator namespace, we can explore the created services.

kubectl get svc -n minio-operator

NAME       TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)             AGE
console    ClusterIP      10.110.251.111   <none>         9090/TCP,9443/TCP   34s
operator   ClusterIP      10.111.75.168    <none>         4222/TCP,4221/TCP   34s

A good idea to ease the console access instead of using port-forwarding as suggested in the output of the MinIO Operator installation is exposing the operator console externally using any of the well known kubernetes methods. As an example, the following manifest creates a LoadBalancer object that will provide external reachability to the MinIO Operator console web site.

vi minio-console-lb.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: minio-operator
  name: minio
  namespace: minio-operator
spec:
  ports:
  - name: http
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: console
  type: LoadBalancer

Apply above yaml file using kubectl apply and check if your Load Balancer component is reporting the allocated External IP address. I am using AVI with its operator (AKO) to capture the LoadBalancer objects and program the external LoadBalancer but feel free to use any other ingress solution of your choice.

kubectl get svc minio -n minio-operator

NAME    TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)          AGE
minio   LoadBalancer   10.97.25.202   10.113.3.101   9090:32282/TCP   3m

Before trying to access the console, you need to get the token that will be used to authenticate the access to the MinIO Operator GUI.

kubectl get secret console-sa-secret -n minio-operator -o jsonpath=”{.data.token}” | base64 -d ; echo

eyJhbGciOi <redacted...>

Now you can easily access to the external IP address that listens in the port 9090 and you should reach the MinIO Operator web that looks like the image below. Use the token as credential to access the console.

The first step is to create a tenant. A tenant, as the wizard states, is just the logical structure to represent a MinIO deployment. A tenant can have different size and configurations from other tenants, even a different storage class. Click on the CREATE TENANT button to start the process.

We will use the name archive, and will place the tenant in the namespace of the same name that we can create from the console itself if it does not exist previously in our target cluster. Make sure you select StorageClass vsphere-sc-sna that has been previously created for this specific purpose.

In the Capacity Section we can select the number of servers, namely the kubernetes nodes that will run a pod, and the number of drives per server, namely the persistent volumes mounted in each pod, that will architecture this software defined storage layer. According to the settings shown below, is easy to do the math to calculate that achieving a Size of 200 Gbytes using 6 servers with 4 drives each you would need 200/(6×4)=8,33 G per drive.

The Erasure Code Parity will set the level of redundancy and availability we need and it’s directly related with overall usable capacity. As you can guess, the more the protection, the more the wasting of storage. In this case, the selected EC:2 will tolerate a single server failure and the usable capacity would be 166,7 Gb. Feel free to change the setting and see how the calculation is refreshed in the table at the right of the window.

MinIO Tenant Creation Dialogue (Settings)

The last block of this configuration section allows you to the assigned resources in terms of CPU and memory that each pod will request to run.

MinIO Tenant Creation Dialogue (Settings 2)

Click on Configure button to define whether we want to expose the MinIO services (S3 endpoint and tenant Console) to provide outside access.

MinIO Tenant Creation Dialogue (Services)

Another important configuration decision is whether we want to use TLS or not to access the S3 API endpoint. By default the tenant and associated buckets uses TLS and autogenerated certs for this purpose.

MinIO Tenant Creation Dialogue (Security)

Other interesting setting you can optionally enable is the Audit Log to store your log transactions into a database. This can be useful for troubleshooting and security purposes. By default the Audit Log is enabled.

The monitoring section will provide allow you to get metrics and dashboards of your tenant by deploying a Prometheus Server to scrape some relevant metrics associated with your service. By default the service is also enabled.

Feel free to explore the rest of settings to change other advanced parameters such as encryption (disabled by default) or authentication (by default using local credentials but integrable with external authentication systems). As soon as you click on the Create button at the bottom right corner you will launch the new tenant and a new window will appear with a set of the credentials associated with the new tenant. Save it in a JSON file or write it down for later usage because there is no way to display it afterwards.

The aspect of the json file is you decide to download it is shown below.

{
  "url":"https://minio.archive.svc.cluster.local:443",
  "accessKey":"Q78mz0vRlnk6melG",
  "secretKey":"6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp",
  "api":"s3v4",
  "path":"auto"
}

We are done with our first MinIO tenant creation process. Let’s move into the next part to inspect the created objects.

Inspecting Minio Operator kubernetes resources

Once the tenant is created the Minio Operator Console will show it up with real-time information. There are some task to be done under the hood to complete process such as deploying pods, pulling container images, creating pvcs and so on so it will take some time to have it ready. The red circle and the absence of statistics indicates the tenant creation is still in process.

After couple of minutes, if you click on the tenant Tile you will land in the tenant administration page from which you will get some information about the current state, the Console and Endpoint URLs, number of drives and so on.

If you click on the YAML button at the top you will see the YAML file. Although the GUI can be useful to take the first steps in the creation of a tenant, when you are planning to do it in an automated fashion and the best approach is to leverage the usage of yaml files to declare the tenant object that is basically a CRD object that the MinIO operator watches to reconcile the desired state of the tenants in the cluster.

In the Configuration section you can get the user (MINIO_ROOT_USER) and password (MINIO_ROOT_PASSWORD). Those credentials can be used to access the tenant console using the corresponding endpoint.

The external URL https://archive-console.archive.avi.sdefinitive.net:9443 can be used to reach our archive tenant console in a separate GUI like the one shown below. Another option available from the Minio Operator GUI is using the Console button at the top. Using this last method will bypass the authentication.

If you click on the Metrics you can see some interesting stats related to the archive tenant as shown below.

Move into kubernetes to check the created pods. As expected a set of 6 pods are running and acting as the storage servers of our tenant. Aditionally there is other complementary pods also running in the namespace for monitoring and logging

kubectl get pod -n archive

NAME                                      READY   STATUS    RESTARTS        AGE
archive-log-0                             1/1     Running   0               5m34s
archive-log-search-api-5568bc5dcb-hpqrw   1/1     Running   3 (5m17s ago)   5m32s
archive-pool-0-0                          1/1     Running   0               5m34s
archive-pool-0-1                          1/1     Running   0               5m34s
archive-pool-0-2                          1/1     Running   0               5m34s
archive-pool-0-3                          1/1     Running   0               5m33s
archive-pool-0-4                          1/1     Running   0               5m33s
archive-pool-0-5                          1/1     Running   0               5m33s
archive-prometheus-0                      2/2     Running   0               2m33s

If you check the volume of one of the pod you can tell how each server (pod) is mounting four volumes as specified upon tenant creation.

kubectl get pod -n archive archive-pool-0-0 -o=jsonpath=”{.spec.containers[*].volumeMounts}” | jq

[
  {
    "mountPath": "/export0",
    "name": "data0"
  },
  {
    "mountPath": "/export1",
    "name": "data1"
  },
  {
    "mountPath": "/export2",
    "name": "data2"
  },
  {
    "mountPath": "/export3",
    "name": "data3"
  },
  {
    "mountPath": "/tmp/certs",
    "name": "archive-tls"
  },
  {
    "mountPath": "/tmp/minio-config",
    "name": "configuration"
  },
  {
    "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
    "name": "kube-api-access-xdftb",
    "readOnly": true
  }
]

Correspondingly each of the volumes should be associated to a PVC bounded to a PV that has been dynamically created at the infrastructre storage layer through the storageClass. If you remember the calculation we did above, the size that each pvc should be 200/ (6 servers x 4) = 8,33 Gi that is aprox the capacity (8534Mi) of the 24 PVCs displayed below.

kubectl get pvc -n archive

NAME                                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
archive-log-archive-log-0                 Bound    pvc-3b3b0738-d4f1-4611-a059-975dc44823ef   5Gi        RWO            vsphere-sc       33m
archive-prometheus-archive-prometheus-0   Bound    pvc-c321dc0e-1789-4139-be52-ec0dbc25211a   5Gi        RWO            vsphere-sc       30m
data0-archive-pool-0-0                    Bound    pvc-5ba0d4b5-e119-49b2-9288-c5556e86cdc1   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-1                    Bound    pvc-ce648e61-d370-4725-abe3-6b374346f6bb   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-2                    Bound    pvc-4a7198ce-1efd-4f31-98ed-fc5d36ebb06b   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-3                    Bound    pvc-26567625-982f-4604-9035-5840547071ea   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-4                    Bound    pvc-c58e8344-4462-449f-a6ec-7ece987e0b67   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-5                    Bound    pvc-4e22d186-0618-417b-91b8-86520e37b3d2   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-0                    Bound    pvc-bf497569-ee1a-4ece-bb2d-50bf77b27a71   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-1                    Bound    pvc-0e2b1057-eda7-4b80-be89-7c256fdc3adc   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-2                    Bound    pvc-f4d1d0ff-e8ed-4c6b-a053-086b7a1a049b   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-3                    Bound    pvc-f8332466-bd49-4fc8-9e2e-3168c307d8db   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-4                    Bound    pvc-0eeae90d-46e7-4dba-9645-b2313cd382c9   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-5                    Bound    pvc-b7d7d1c1-ec4c-42ba-b925-ce02d56dffb0   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-0                    Bound    pvc-08642549-d911-4384-9c20-f0e0ab4be058   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-1                    Bound    pvc-638f9310-ebf9-4784-be87-186ea1837710   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-2                    Bound    pvc-aee4d1c0-13f7-4d98-8f83-c047771c4576   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-3                    Bound    pvc-06cd11d9-96ed-4ae7-a9bc-3b0c52dfa884   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-4                    Bound    pvc-e55cc7aa-8a3c-463e-916a-8c1bfc886b99   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-5                    Bound    pvc-64948a13-bdd3-4d8c-93ad-155f9049eb36   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-0                    Bound    pvc-565ecc61-2b69-45ce-9d8b-9abbaf24b829   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-1                    Bound    pvc-c61d45da-d7da-4675-aafc-c165f1d70612   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-2                    Bound    pvc-c941295e-3e3d-425c-a2f0-70ee1948b2f0   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-3                    Bound    pvc-7d7ce3b1-cfeb-41c8-9996-ff2c3a7578cf   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-4                    Bound    pvc-c36150b1-e404-4ec1-ae61-6ecf58f055e1   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-5                    Bound    pvc-cd847c84-b5e1-4fa4-a7e9-a538f9424dbf   8534Mi     RWO            vsphere-sc-sna   33m

Everything looks good so let’s move into the tenant console to create a new bucket.

Creating a S3 Bucket

Once the tenant is ready, you can use S3 API to create buckets and to push data into them. When using MinIO Operator setup you can also use the Tenant GUI as shown below. Access the tenant console using the external URL or simply jump from the Operator Console GUI and you will reach the following page.

A bucket in S3 Object Storage is similar to a Folder in a traditional filesystem and its used to organice the pieces of data (objects). Create a new bucket with the name my-bucket.

Accessing S3 using MinIO Client (mc)

Now lets move into the CLI to check how can we interact via API with the tenant through the minio client (mc) tool. To install it just issue below commands.

curl https://dl.min.io/client/mc/release/linux-amd64/mc \
  --create-dirs \
  -o $HOME/minio-binaries/mc

chmod +x $HOME/minio-binaries/mc
export PATH=$PATH:$HOME/minio-binaries/

As a first step declare a new alias with the definition of our tenant. Use the tenant endpoint and provide the accesskey and the secretkey you capture at the time of tenant creation. Remember we are using TLS and self-signed certificates so the insecure flag is required.

mc alias set minio-archive https://minio.archive.avi.sdefinitive.net Q78mz0vRlnk6melG 6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp –insecure

Added `minio-archive` successfully.

The info keyword can be used to show relevant information of our tenant such as the servers (pods) and the status of the drives (pvc) and the network. Each of the servers will listen in the port 9000 to expose the storage access externally.

mc admin info minio-archive –insecure

●  archive-pool-0-0.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-1.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-2.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-3.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-4.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-5.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

Pools:
   1st, Erasure sets: 2, Drives per erasure set: 12

0 B Used, 1 Bucket, 0 Objects
24 drives online, 0 drives offline

You can list the existing buckets using mc ls command as shown below.

mc ls minio-archive –insecure

[2023-01-04 13:56:39 CET]     0B my-bucket/

As a quick test to check if we can able to write a file into the S3 bucket, following command will create a dummy 10Gb file using fallocate.

fallocate -l 10G /tmp/10Gb.file

Push the 10G file into S3 using following mc cp command

mc cp /tmp/10Gb.file minio-archive/my-bucket –insecure

/tmp/10Gb.file:                        10.00 GiB / 10.00 GiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.52 MiB/s 3m18s

While the transfer is still running you can verify at the tenant console that some stats are being populated in the metrics section.

Additionally you can also display some metrics dashboards in the Traffic tab.

Once the transfer is completed, if you list again to see the contents of the bucket you should see something like this.

mc ls minio-archive/my-bucket –insecure

[2023-01-04 17:20:14 CET]  10GiB STANDARD 10Gb.file

Consuming Minio S3 backend from Kubernetes Pods

Now that we know how to install the MinIO Operator, how to create a Tenant and how to create a bucket it is time to understand how to consume the MinIO S3 backend from a regular Pod. The access to the data in a S3 bucket is done generally through API calls so we need a method to create an abstraction layer that allow the data to be mounted in the OS filesystem as a regular folder or drive in a similar way to a NFS share.

This abstraction layer in Linux is implemented by FUSE (Filesystem in Userspace) which, in a nutshell is a user-space program able to mount a file system that appears to the operating system as if it were a regular folder. There is an special project called s3fs-fuse that allow precisely a Linux OS to mount a S3 bucket via FUSE while preserving the native object format.

As a container is essentially a Linux OS, we just need to figure out how to take advantage of this and use it in a kubernetes environment.

Option 1. Using a custom image

The first approach consists on creating a container to access the remote minio bucket presenting it to the container image as a regular filesystem. The following diagram depicts a high level overview of the intended usage.

As you can guess s3fs-fuse is not part of any core linux distribution so the first step is to create a custom image containing the required software that will allow our container to mount the S3 bucket as a regular file system. Let’s start by creating a Dockerfile that will be used as a template to create the custom image. We need to specify also the command we want to run when the container is spinned up. Find below the Dockerfile to create the custom image with some explanatory comments.

vi Dockerfile

# S3 MinIO mounter sample file 
# 
# Use ubuntu 22.04 as base image
FROM ubuntu:22.04

# MinIO S3 bucket will be mounted on /var/s3 as mouting point
ENV MNT_POINT=/var/s3

# Install required utilities (basically s3fs is needed here). Mailcap install MIME types to avoid some warnings
RUN apt-get update && apt-get install -y s3fs mailcap

# Create the mount point in the container Filesystem
RUN mkdir -p "$MNT_POINT"

# Some optional housekeeping to get rid of unneeded packages
RUN apt-get autoremove && apt-get clean

# 1.- Create a file containing the credentials used to access the Minio Endpoint. 
#     They are defined as variables that must be passed to the container ideally using a secret
# 2.- Mount the S3 bucket in /var/s3. To be able to mount the filesystem s3fs needs
#      - credentials injected as env variables through a Secret and written in s3-passwd file
#      - S3 MinIO Endpoint (passed as env variable through a Secret)
#      - S3 Bucket Name (passed as env variable through a Secret)
#      - other specific keywords for MinIO such use_path_request_style and insecure SSL
# 3.- Last tail command to run container indefinitely and to avoid completion

CMD echo "$ACCESS_KEY:$SECRET_KEY" > /etc/s3-passwd && chmod 600 /etc/s3-passwd && \
    /usr/bin/s3fs "$MINIO_S3_BUCKET" "$MNT_POINT" -o passwd_file=/etc/s3-passwd \
    -o url="$MINIO_S3_ENDPOINT" \
    -o use_path_request_style -o no_check_certificate -o ssl_verify_hostname=0 && \
    tail -f /dev/null

Once the Dockerfile is defined we can build and push the image to your registry. Feel free to use mine here if you wish. If you are using dockerhub instead of a private registry you would need to have an account and login before proceeding.

docker login

Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: jhasensio
Password: <yourpasswordhere>
WARNING! Your password will be stored unencrypted in /home/jhasensio/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

Now build your image using Dockerfile settings.

docker build –rm -t jhasensio/s3-mounter .

Sending build context to Docker daemon  18.94kB
Step 1/6 : FROM ubuntu:22.04
22.04: Pulling from library/ubuntu
677076032cca: Pull complete 
Digest: sha256:9a0bdde4188b896a372804be2384015e90e3f84906b750c1a53539b585fbbe7f
Status: Downloaded newer image for ubuntu:22.04
 ---> 58db3edaf2be
Step 2/6 : ENV MNT_POINT=/var/s3
 ---> Running in d8f0c218519d
Removing intermediate container d8f0c218519d
 ---> 6a4998c5b028
Step 3/6 : RUN apt-get update && apt-get install -y s3fs mailcap
 ---> Running in a20f2fc03315
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
... <skipped>

Step 4/6 : RUN mkdir -p "$MNT_POINT"
 ---> Running in 78c5f3328988
Removing intermediate container 78c5f3328988
 ---> f9cead3b402f
Step 5/6 : RUN apt-get autoremove && apt-get clean
 ---> Running in fbd556a42ea2
Reading package lists...
Building dependency tree...
Reading state information...
0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.
Removing intermediate container fbd556a42ea2
 ---> 6ad6f752fecc
Step 6/6 : CMD echo "$ACCESS_KEY:$SECRET_KEY" > /etc/s3-passwd && chmod 600 /etc/s3-passwd &&     /usr/bin/s3fs $MINIO_S3_BUCKET $MNT_POINT -o passwd_file=/etc/s3-passwd -o use_path_request_style -o no_check_certificate -o ssl_verify_hostname=0 -o url=$MINIO_S3_ENDPOINT  &&     tail -f /dev/null
 ---> Running in 2c64ed1a3c5e
Removing intermediate container 2c64ed1a3c5e
 ---> 9cd8319a789d
Successfully built 9cd8319a789d
Successfully tagged jhasensio/s3-mounter:latest

Now that you have your image built it’s time to push it to the dockerhub registry using following command

docker push jhasensio/s3-mounter

Using default tag: latest
The push refers to repository [docker.io/jhasensio/s3-mounter]
2c82254eb995: Pushed 
9d49035aff15: Pushed 
3ed41791b66a: Pushed 
c5ff2d88f679: Layer already exists 
latest: digest: sha256:ddfb2351763f77114bed3fd622a1357c8f3aa75e35cc66047e54c9ca4949f197 size: 1155

Now you can use your custom image from your pods. Remember this image has been created with a very specific purpose of mounting an S3 bucket in his filesystem, the next step is to inject the configuration and credentials needed into the running pod in order to sucess in the mouting process. There are different ways to do that but the recommended method is passing variables through a secret kubernetes object. Lets create a file with the required environment variables. The variables shown here were captured at tenant creation time and depends on your specific setup and tenant and bucket selected names.

vi s3bucket.env

MINIO_S3_ENDPOINT=https://minio.archive.svc.cluster.local:443
MINIO_S3_BUCKET=my-bucket
ACCESS_KEY=Q78mz0vRlnk6melG
SECRET_KEY=6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp

Now create the secret object using the previous environment file as source as shown below.

kubectl create secret generic s3-credentials –from-env-file=s3bucket.env

secret/s3-credentials created

Verify the content. Note the variables appears as Base64 coded once the secret is created.

kubectl get secrets s3-credentials -o yaml

apiVersion: v1
data:
  ACCESS_KEY: UTc4bXowdlJsbms2bWVsRw==
  MINIO_S3_BUCKET: bXktYnVja2V0
  MINIO_S3_ENDPOINT: aHR0cHM6Ly9taW5pby5hcmNoaXZlLnN2Yy5jbHVzdGVyLmxvY2FsOjQ0Mw==
  SECRET_KEY: Nm5XdEwyOE4xZXlWZFpkdDE2Q0lXeWl2VWgzUEI1RnA=
kind: Secret
metadata:
  creationTimestamp: "2023-02-10T09:33:53Z"
  name: s3-credentials
  namespace: default
  resourceVersion: "32343427"
  uid: d0628fff-0438-41e8-a3a4-d01ee20f82d0
type: Opaque

If you need to be sure about your coded variables just revert the process to double check if the injected data is what you expect. For example taking MINIO_S3_ENDPOINT as an example use following command to get the plain text of any given secret data entry.

kubectl get secrets s3-credentials -o jsonpath=”{.data.MINIO_S3_ENDPOINT}” | base64 -d ; echo

https://minio.archive.svc.cluster.local:443

Now that the configuration secret is ready we can proceed with the pod creation. Note you would need privilege access to be able to access the fuse kernel module that is needed to access the S3 bucket

vi s3mounterpod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: s3-mounter
spec:
  containers:
    - image: jhasensio/s3-mounter
      name: s3-mounter
      imagePullPolicy: Always
      # Privilege mode needed to access FUSE kernel module
      securityContext:
        privileged: true
      # Injecting credentials from secret (must be in same ns)
      envFrom:
      - secretRef:
          name: s3-credentials
      # Mounting host devfuse is required
      volumeMounts:
      - name: devfuse
        mountPath: /dev/fuse
  volumes:
    - name: devfuse
      hostPath:
        path: /dev/fuse

Open an interactive session to the pod for further exploration.

kubectl exec s3-mounter -ti — bash

root@s3-mounter:/#

Verify the secret injected variables are shown as expected

root@s3-mounter:/# env | grep -E “MINIO|KEY”

ACCESS_KEY=Q78mz0vRlnk6melG
SECRET_KEY=6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp
MINIO_S3_ENDPOINT=https://minio.archive.svc.cluster.local:443
MINIO_S3_BUCKET=my-bucket

Also verify the mount in the pod filesystem

root@s3-mounter:/# mount | grep s3

s3fs on /var/s3 type fuse.s3fs (rw,nosuid,nodev,relatime,user_id=0,group_id=0)

You should be able to read any existing data in the s3 bucket. Note how you can list 10Gb.file we created earlier using the mc client.

root@s3-mounter:~# ls -alg /var/s3

total 10485765
drwx------ 1 root           0 Jan  1  1970 .
drwxr-xr-x 1 root        4096 Jan 04 17:00 ..
-rw-r----- 1 root 10737418240 Jan 04 17:53 10Gb.file

Create a new file to verify if you can write as well.

root@s3-mounter:~# touch /var/s3/file_from_s3_mounter_pod.txt

The new file should appear in the bucket

root@s3-mounter:/var/s3# ls -alg

total 10485766
drwx------ 1 root           0 Jan  1  1970 .
drwxr-xr-x 1 root        4096 Jan 04 17:00 ..
-rw-r----- 1 root 10737418240 Jan 04 17:53 10Gb.file
-rw-r--r-- 1 root           0 Jan 04 17:59 file_from_s3_mounter_pod.txt

And it should be accesible using the mc client as seen below.

mc ls minio-archive/my-bucket –insecure

[2023-01-04 17:53:34 CET]  10GiB STANDARD 10Gb.file
[2023-01-04 17:59:22 CET]     0B STANDARD file_from_s3_mounter_pod.txt

These are great news but, is that approach really useful? That means that to be able to reach an S3 bucket we need to prepare a custom image with our intended application and add the s3fs to ensure we are able to mount a MinIO S3 filesystem. That sounds pretty rigid so lets explore other more flexible options to achieve similar results.

Option 2. Using a sidecar pattern

Once we have prepared the custom image and have verified everything is working as expected we can give another twist. As you probably know there is a common pattern available in kubernetes that involves running an additional container (a.k.a sidecar) alongside the main container in a pod. The sidecar container provides additional functionality, such as logging, monitoring, or networking, to the main container. In our case the additional container will be in charge of mouting the MinIO S3 filesystem allowing the main container to focus on its core responsibility and offload this storage related tasks to the sidecar. The following picture depicts the intended arquictecture.

Pod using Sidecar Pattern with S3-Mounter

Lets try to create a simple pod to run apache to provide access to an S3 backed filesystem via web. The first step is to create a custom httpd.conf file to serve html document in a custom path at “/var/www/html”. Create a file like shown below:

vi httpd.conf

ServerRoot "/usr/local/apache2"
ServerName "my-apache"
Listen 80
DocumentRoot "/var/www/html"
LoadModule mpm_event_module modules/mod_mpm_event.so
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule unixd_module modules/mod_unixd.so
LoadModule autoindex_module modules/mod_autoindex.so

<Directory "/var/www/html">
  Options Indexes FollowSymLinks
  AllowOverride None
  Require all granted
</Directory>

Now create a configmap object from the created file to easily inject the custom configuration into the pod.

kubectl create cm apache-config –from-file=httpd.conf

configmap/apache-config created

Find below a sample sidecar pod. The mountPropagation spec does the trick here. If you are interested, there is a deep dive around Mount Propagation in this blog. The bidirectional mountPropagation Bidirectional allow any volume mounts created by the container to be propagated back to the host and to all containers of all pods that use the same volume that is exactly what we are trying to achieve here. The s3-mounter sidecar container will mount the S3 bucket and will propagate to the host as a local mount that in turn will be available to the main apache pod as a regular hostPath type persistent volume.

vi web-server-sidecar-s3.yaml

apiVersion: v1
kind: Pod
metadata:
  name: web-server-sidecar-s3
spec:
  containers:
      ######
      # Main apache container
      ####
    - name: apache2
      image: httpd:2.4.55
      securityContext:
        privileged: true
      ports:
        - containerPort: 80
          name: http-web
      volumeMounts:
      # Mount custom apache config file from volume
        - name: apache-config-volume
          mountPath: /tmp/conf
      # Mount S3 bucket into web server root. (bidirectional
        - name: my-bucket
          mountPath: /var/www
          mountPropagation: Bidirectional
      # Copy custom httpd.conf extracted from configmap in right path and run httpd
      command: ["/bin/sh"]
      args: ["-c", "cp /tmp/conf/httpd.conf /usr/local/apache2/conf/httpd.conf && /usr/local/bin/httpd-foreground"]
      ######
      # Sidecar Container to mount MinIO S3 Bucket
      ####
    - name: s3mounter
      image: jhasensio/s3-mounter
      imagePullPolicy: Always
      securityContext:
        privileged: true
      envFrom:
        - secretRef:
            name: s3-credentials
      volumeMounts:
        - name: devfuse
          mountPath: /dev/fuse
      # Mount S3 bucket  (bidirectional allow sharing between containers)
        - name: my-bucket
          mountPath: /var/s3
          mountPropagation: Bidirectional
     # Safely umount filesystem before stopping
      lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","fusermount -u /var/s3"]
  volumes:
    - name: devfuse
      hostPath:
        path: /dev/fuse
    # Host filesystem
    - name: my-bucket
      hostPath:
        path: /mnt/my-bucket
    # Safely umount filesystem before stopping
    - name: apache-config-volume
      configMap:
        name: apache-config

Once the pod is ready try to list the contents of the /var/s3 folder at s3mounter sidecar container using follwing command (note we need to specify the name of the container with -c keyword to reach the intented one). The contents of the folder should be listed.

kubectl exec web-server-sidecar-s3 -c s3mounter — ls /var/s3/

10Gb.file
file_from_s3_mounter_pod.txt
html

Repeat the same for the apache2 container. The contents of the folder should be listed as well.

kubectl exec web-server-sidecar-s3 -c apache2 — ls /var/www/

10Gb.file
file_from_s3_mounter_pod.txt
html

Also in the worker filesystem the mounted S3 should be available. Extract the IP of the worker in which the pod is running using the .status.hostIP and try to list the contents of the local hostpath /mnt/my-bucket. You can jump into the worker IP or using a single command via ssh remote execution as seen below:

ssh $(kubectl get pod web-server-sidecar-s3 -o jsonpath=”{.status.hostIP}”) sudo ls /mnt/my-bucket

10Gb.file
file_from_s3_mounter_pod.txt
html

Now create some dummy files in the worker filesystem and in the html folder. This html folder is the configured root directory in apache2 to serve html documents.

ssh $(kubectl get pod web-server-sidecar-s3 -o jsonpath="{.status.hostIP}") sudo touch /mnt/my-bucket/html/file{01..10}.txt

And now both pods should be able to list the new files as shown below from apache2.

kubectl exec web-server-sidecar-s3 -c apache2 — ls /var/www/html

file01.txt
file02.txt
file03.txt
file04.txt
file05.txt
file06.txt
file07.txt
file08.txt
file09.txt
file10.txt

Apache2 will serve the content of the html folder so lets try to verify we can actually access the files through http. First forward the apache2 pod port listening at port 80 to a local port 8080 via kubectl port-forward command.

kubectl port-forward web-server-sidecar-s3 8080:80 &

Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80

And now try to reach the apache2 web server. According to the httpd.conf it should display any file in the folder. Using curl you can verify this is working as displayed below

curl localhost:8080

Handling connection for 8080
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /</title>
 </head>
 <body>
<h1>Index of /</h1>
<ul><li><a href="file01.txt"> file01.txt</a></li>
<li><a href="file02.txt"> file02.txt</a></li>
<li><a href="file03.txt"> file03.txt</a></li>
<li><a href="file04.txt"> file04.txt</a></li>
<li><a href="file05.txt"> file05.txt</a></li>
<li><a href="file06.txt"> file06.txt</a></li>
<li><a href="file07.txt"> file07.txt</a></li>
<li><a href="file08.txt"> file08.txt</a></li>
<li><a href="file09.txt"> file09.txt</a></li>
<li><a href="file10.txt"> file10.txt</a></li>
</ul>
</body></html>

If you want to try from a browser you should also access the S3 bucket.

Now we have managed to make the s3-mounter pod independent of a regular pod using a sidecar pattern to deploy our application. Regardless this configuration may fit with some requirements it may not scale very well in other environments. For example, if you create another sidecar pod in the same worker node trying to access the very same bucket you will end up with an error generated by the new s3-mounter when trying to mount bidirecctionally a non-empty volume (basically because it is already mounted by the other pod) as seen below.

kubectl logs web-server-sidecar-s3-2 -c s3mounter

s3fs: MOUNTPOINT directory /var/s3 is not empty. if you are sure this is safe, can use the 'nonempty' mount option.

So basically you must use a different hostPath such /mnt/my-bucket2 to avoid above error which implies an awareness and management of existing hostPaths which sounds weird and not very scalable. This is where the third approach come into play.

Option 3. Using a DaemonSet to create a shared volume

This option relies on a DaemonSet object type. According to official documentation A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.

This way we can think in the bucket as a s3-backed storage provider with a corresponding persistent volume mounted as HostPath in all the workers. This approach is ideal to avoid any error when multiple pods try to mount the same bucket using the same host filesystem path. The following picture depicts the architecture with a 3 member cluster.

S3 Backend access using shared hostpath volumes with DaemonSet mounter

The DaemonSet object will look like the one shown below. We have added some lifecycle stuff to ensure the S3 filesystem is properly unmounted whe the pod is stopped.

vi s3-mounter-daemonset.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: minio-s3-my-bucket
  name: minio-s3-my-bucket
spec:
  selector:
    matchLabels:
       app: minio-s3-my-bucket
  template:
    metadata:
      labels:
        app: minio-s3-my-bucket
    spec:
      containers:
      - name: s3fuse
        image: jhasensio/s3-mounter
        imagePullPolicy: Always
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","fusermount -u /var/s3"]
        securityContext:
          privileged: true
        envFrom:
        - secretRef:
            name: s3-credentials
        volumeMounts:
        - name: devfuse
          mountPath: /dev/fuse
        - name: my-bucket
          mountPath: /var/s3
          mountPropagation: Bidirectional
      volumes:
      - name: devfuse
        hostPath:
          path: /dev/fuse
      - name: my-bucket
        hostPath:
          path: /mnt/my-bucket

Apply the above manifest and verify the daemonset is properly deployed and is ready and running in all the workers in the cluster.

kubectl get daemonsets.apps

NAME                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
minio-s3-my-bucket   6         6         6       6            6           <none>          153m

Verify you can list the contents of the S3 bucket

kubectl exec minio-s3-my-bucket-fzcb2 — ls /var/s3

10Gb.file
file_from_s3_mounter_pod.txt
html

Now that the daemonSet is properly running we should be able to consume the /mnt/my-bucket path in the worker filesystem as a regular hostPath volume. Let’s create the same pod we used previously as an single container pod. Remember to use Bidirectional mountPropagation again.

vi apache-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: apachepod
spec:
  containers:
    - name: web-server
      image: httpd:2.4.55
      securityContext:
        privileged: true
      ports:
        - containerPort: 80
          name: http-web
      volumeMounts:
        - name: apache-config-volume
          mountPath: /tmp/conf
        - name: my-bucket
          mountPath: /var/www
          mountPropagation: Bidirectional
      command: ["/bin/sh"]
      args: ["-c", "cp /tmp/conf/httpd.conf /usr/local/apache2/conf/httpd.conf && /usr/local/bin/httpd-foreground"]
  volumes:
    - name: apache-config-volume
      configMap:
        name: apache-config
    - name: my-bucket
      hostPath: 
        path: /mnt/my-bucket

Try to list the contents of the volume pointing to the hostPath /mnt/my-bucket that in turn points to the /var/s3 folder used by the daemonset controlled pod to mount the s3 bucket.

kubectl exec apachepod — ls /var/www

10Gb.file
file_from_s3_mounter_pod.txt
html

Repeat the port-forward to try to reach the apache2 web server.

kubectl port-forward apachepod 8080:80 &

Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80

And you should see again the contents of /var/www/html folder via http as shown below.

curl localhost:8080

Handling connection for 8080
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /</title>
 </head>
 <body>
<h1>Index of /</h1>
<ul><li><a href="file01.txt"> file01.txt</a></li>
<li><a href="file02.txt"> file02.txt</a></li>
<li><a href="file03.txt"> file03.txt</a></li>
<li><a href="file04.txt"> file04.txt</a></li>
<li><a href="file05.txt"> file05.txt</a></li>
<li><a href="file06.txt"> file06.txt</a></li>
<li><a href="file07.txt"> file07.txt</a></li>
<li><a href="file08.txt"> file08.txt</a></li>
<li><a href="file09.txt"> file09.txt</a></li>
<li><a href="file10.txt"> file10.txt</a></li>
</ul>
</body></html>

It works!

Bonus: All together now. It’s time for automation

So far we have explored different methods to install and consume an S3 backend over a vSAN enabled infrastructure. Some of this methods are intented to be done by a human, in fact, access to the console is very interesting for learning and for performing maintenance and observability tasks on day 2 where a graphical interface might be crucial.

However, in the real world and especially when using kubernetes, it would be very desirable to be able to industrialize all these provisioning tasks without any mouse “click”. Next you will see a practical case for provisioning via CLI and therefore 100% automatable.

To recap, the steps we need to complete the provisioning of a S3 bucket from scratch given a kubernetes cluster with MinIO operator installed are summarized below

Create a namespace to place all MinIO tenant related objects
Create a MinIO tenant using MinIO plugin (or apply an existing yaml manifiest containing the Tenant CRD and rest of required object)
Extract credentials for the new tenant and inject them into a kubernetes secret object
Create the MinIO S3 bucket via a kubernetes job
Deploy Daemonset to expose the S3 backend as a shared volume

The following script can be used to automate everything. It will only require two parameters namely Tenant and Bucket name. In the script they are statically set but it would be very easy to pass both parameters as an input.

Sample Script to Create MinIO Tenant and share S3 bucket through a Bidirectional PV

#/bin/bash

# INPUT PARAMETERS
# ----------------
TENANT=test
BUCKET=mybucket

# STEP 1 CREATE NAMESPACE AND TENANT
# --------------------------------
kubectl create ns $TENANT
kubectl minio tenant create $TENANT --servers 2 --volumes 4 --capacity 10G --namespace $TENANT --storage-class vsphere-sc-sna

# STEP 2 DEFINE VARIABLES, EXTRACT CREDENTIALS FROM CURRENT TENANT AND PUT TOGETHER IN A SECRET 
# ----------------------------------------------------------------------------------------------
echo "MINIO_S3_ENDPOINT=https://minio.${TENANT}.svc.cluster.local" > s3bucket.env
echo "MINIO_S3_BUCKET=${BUCKET}" >> s3bucket.env
echo "SECRET_KEY=$(kubectl get secrets -n ${TENANT} ${TENANT}-user-1 -o jsonpath="{.data.CONSOLE_SECRET_KEY}" | base64 -d)" >> s3bucket.env
echo "ACCESS_KEY=$(kubectl get secrets -n ${TENANT} ${TENANT}-user-1 -o jsonpath="{.data.CONSOLE_ACCESS_KEY}" | base64 -d)" >> s3bucket.env

kubectl create secret generic s3-credentials --from-env-file=s3bucket.env

# STEP 3 CREATE BUCKET USING A JOB
----------------------------------
kubectl apply -f create-s3-bucket-job.yaml


# STEP 4 WAIT FOR S3 BUCKET CREATION JOB TO SUCCESS
---------------------------------------------------
kubectl wait pods --for=jsonpath='{.status.phase}'=Succeeded -l job-name=create-s3-bucket-job --timeout=10m


# STEP 5 DEPLOY DAEMONSET TO SHARE S3 BUCKET AS A PV
----------------------------------------------------
kubectl apply -f s3-mounter-daemonset.yaml

Note the Step 2 requires the MinIO operator deployed in the cluster and also the krew MinIO plugin installed. Alternatively, you can also use the –output to dry-run the tenant creation command and generate an output manifest yaml file that can be exported an used later without the krew plugin installed. The way to generate the tenant file is shown below.

kubectl minio tenant create $TENANT --servers 2 --volumes 4 --capacity 10G --namespace $TENANT --storage-class vsphere-sc --output > tenant.yaml

Is also worth to mention we have deciced to create the MinIO bucket through a job. This is a good choice to perform task in a kubernetes cluster. As you can tell looking into the manifest content below, the command used in the pod in charge of the job includes some simple logic to ensure the task is retried in case of error whilst the tenant is still spinning up and will run until the bucket creation is completed succesfully. The manifest that define the job is shown below.

vi create-s3-bucket-job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: create-s3-bucket-job
spec:
  template:
    spec:
      containers:
      - name: mc
       # s3-credentials secret must contain $ACCESS_KEY, $SECRET_KEY, $MINIO_S3_ENDPOINT, $MINIO_S3_BUCKET 
        envFrom:
        - secretRef:
            name: s3-credentials
        image: minio/mc
        command: 
          - sh
          - -c
          - ls /tmp/error > /dev/null 2>&1 ; until [[ "$?" == "0" ]]; do sleep 5; echo "Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs"; mc alias set s3 $(MINIO_S3_ENDPOINT) $(ACCESS_KEY) $(SECRET_KEY) --insecure; done && mc mb s3/$(MINIO_S3_BUCKET) --insecure
      restartPolicy: Never
  backoffLimit: 4

Once the job is created you can observe up to three diferent errors of the MinIO client trying to complete the bucket creation. The logic will reattempt until completion as shown below:

kubectl logs -f create-s3-bucket-job-qtvx2

mc: <ERROR> Unable to initialize new alias from the provided credentials. Get "https://minio.my-minio-tenant.svc.cluster.local/probe-bucket-sign-ubatm9xin2a
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. Get "https://minio.my-minio-tenant.svc.cluster.local/probe-bucket-sign-n09ttv9pnfw
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. Server not initialized, please try again.                              
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. Server not initialized, please try again.                              
...skipped                                                                          
mc: <ERROR> Unable to initialize new alias from the provided credentials. The Access Key Id you provided does not exist in our records.          
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                                                                                                        
mc: <ERROR> Unable to initialize new alias from the provided credentials. The Access Key Id you provided does not exist in our records.          
... skipped         
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. The Access Key Id you provided does not exist in our records.          
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
Added `s3` successfully.                                                                                                                         
Bucket created successfully `s3/mybucket`.

Once the job is completed, the wait condition (status.phase=Suceeded) will be met and the script will continue to next step that consists on the deployment of the DaemonSet. Once the DaemonSet is ready you should be able to consume the hostPath type PV that points to the S3 bucket (/mnt/my-bucket) in this case from any regular pod. Create a test file in the pod mount folder (/var/s3).

kubectl exec pods/minio-s3-my-bucket-nkgw2 -ti -- touch /var/s3/test.txt

Now you can spin a simple sleeper pod that is shown below for completeness. Don’t forget to add the Bidirectional mountPropagation spec.

vi sleeper_pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: s3-sleeper-test
spec:
  containers:
    - name: sleeper
      image: ubuntu
      securityContext:
        privileged: true
      volumeMounts:
        - name: my-bucket
          mountPath: /data
          mountPropagation: Bidirectional
      command: ["sleep", "infinity"]
  volumes:
    - name: my-bucket
      hostPath: 
        path: /mnt/my-bucket

We should be able to list de S3 bucket objects under /data pod folder.

kubectl exec pods/s3-sleeper-test -ti — ls -alg /data

total 5
drwx------ 1 root    0 Jan  1  1970 .
drwxr-xr-x 1 root 4096 Feb 13 10:16 ..
-rw-r--r-- 1 root    0 Feb 13 10:13 test.txt

We are done!!!

This has been a quite lenghty series of articles. We have learn how to install vSphere CSI Driver, how to enable vSAN File Services and, in this post, how to setup MinIO over the top of a vSAN enabled infrastructure and how to consume from kubernetes pods using different approaches. Hope you can find useful.

Preparing vSphere for Kubernetes Persistent Storage (2/3): Enabling vSAN File Services

December 3, 2022 / jhasensio

Enabling vSAN File Services
- Creating File Shares from vSphere
Consuming vSAN Network File Shares from Pods

In the previous article we walked through the preparation of an upstream k8s cluster to take advantage of the converged storage infrastructure that vSphere provides by using a CSI driver that allows the pod to consume the vSAN storage in the form of Persistent Volumes created automatically through an special purpose StorageClass.

With the CSI driver we would have most of the persistent storage needs for our pods covered, however, in some particular cases it is necessary that multiple pods can mount and read/write the same volume simultaneously. This is basically defined by the Access Mode specification that is part of the PV/PVC definition. The typical Access Modes available in kubernetes are:

ReadWriteOnce – Mount a volume as read-write by a single node
ReadOnlyMany – Mount the volume as read-only by many nodes
ReadWriteMany – Mount the volume as read-write by many nodes

In this article we will focus on the Access Mode ReadWriteMany (RWX) that allow a volume to be mounted simultaneously in read-write mode for multiple pods running in different kubernetes nodes. This access mode is tipically supported by a network file sharing technology such as NFS. The good news are this is not a big deal if we have vSAN because, again, we can take advantage of this wonderful technology to enable the built-in file services and create shared network shares in a very easy and integrated way.

Enabling vSAN File Services

The procedure for doing this is described below. Let’s move into vSphere GUI for a while. Access to your cluster and go to vSAN>Services. Now click on ENABLE blue button at the bottom of the File Service Tile.

The first step will be to select the network on which the service will be deployed. In my case it will select a specific PortGroup in the subnet 10.113.4.0/24 and the VLAN 1304 with name VLAN-1304-NFS.

This action will trigger the creation of the necessary agents in each one of the hosts that will be in charge of serving the shared network resources via NFS. After a while we should be able to see a new Resource Group named ESX Agents with four new vSAN File Service Node VMs.

Once the agents have been deployed we can access to the services configuration and we will see that the configuration is incomplete because we haven’t defined some important service parameters yet. Click on Configure Domain button at the botton of the File Service tile.

The first step is to define a domain that will host the names of the shared services. In my particular case I will use vsanfs.sdefinitive.net as domain name. Any shared resource will be reached using this domain.

Before continuing, it is important that our DNS is configured with the names that we will give to the four file services agents needed. In my case I am using fs01, fs02, fs03 and fs04 as names in the domain vsanfs.sdefinitive.net and the IP addresses 10.113.4.100-103. Additionally we will have to indicate the IP of the DNS server used in the resolution, the netmask and the default gateway as shown below.

vSAN File Services Network Configuration

In the next screen we will see the option to integrate with AD, at the moment we can skip it because we will consume the network services from pods.

vSAN File Service integration with Active Directory

Next you will see the summary of all the settings made for a last review before proceeding.

vSAN File Service Domain configuration summary

Once we press the FINISH green button the network file services will be completed and ready to use.

Creating File Shares from vSphere

Once the vSAN File Services have been configured we should be able to create network shares that will be eventually consumed in the form of NFS type volumes from our applications. To do this we must first provision the file shares according our preferences. Go to File Services and click on ADD to create a new file share.

The file share creation wizard allow us to specify some important parameters such as the name of our shared service, the protocol (NFS) used to export the file share, the NFS version (4.1 and 3), the Storage Policy that the volume will use and, finally other quota related settings such as the size and warning threshold for our file share.

Additionally we can set a add security by means of a network access control policy. In our case we will allow any IP to access the shared service so we select the option “Allow access from any IP” but feel free to restrict access to certain IP ranges in case you need it.

Network Access Control for vSAN file share

Once all the parameters have been set we can complete the task by pressing the green FINISH button at the bottom right side of the window.

Let’s inspect the created file share that will be seen as another vSAN Virtual Object from the vSphere administrator perspective.

If we click on the VIEW FILE SHARE we could see the specific configuration of our file share. Write down the export path (fs01.vsanfs.sdefinitive.net:/vsanfs/my-nfs-share) since it will be used later as an specification of the yaml manifest that will declare the corresponding persistent volume kubernetes object.

From an storage administrator perspective we are done. Now we will see how to consume it from the developer perspective through native kubernetes resources using yaml manifest.

Consuming vSAN Network File Shares from Pods

A important requirement to be able to mount nfs shares is to have the necesary software installed in the worker OS, otherwise the mounting process will fail. If you are using a Debian’s familly Linux distro such as Ubuntu, the installation package that contains the necessary binaries to allow nfs mounts is nfs-common. Ensure this package is installed before proceeding. Issue below command to meet the requirement.

sudo apt-get install nfs-common

Before proceeding with creation of PV/PVCs, it is recommended to test connectivity from the workers as a sanity check. The first basic test would be pinging to the fqdn of the host in charge of the file share as defined in the export path of our file share captured earlier. Ensure you can also ping to the rest of the nfs agents defined (fs01-fs04).

ping fs01.vsanfs.sdefinitive.net

PING fs01.vsanfs.sdefinitive.net (10.113.4.100) 56(84) bytes of data.
64 bytes from 10.113.4.100 (10.113.4.100): icmp_seq=1 ttl=63 time=0.688 ms

If DNS resolution and connectivity is working as expected we are safe to mount the file share in any folder in your filesystem. Following commands show how to mount the file share using NFS 4.1 by using the export part associated to our file share. Ensure the mount point (/mnt/my-nfs-share in this example) exists before proceeding. If not so create in advance using mkdir as usual.

mount -t nfs4 -o minorversion=1,sec=sys fs01.vsanfs.sdefinitive.net:/vsanfs/my-nfs-share /mnt/my-nfs-share -v

mount.nfs4: timeout set for Fri Dec 23 21:30:09 2022
mount.nfs4: trying text-based options 'minorversion=1,sec=sys,vers=4,addr=10.113.4.100,clientaddr=10.113.2.15'

If the mounting is sucessfull you should be able to access the share at the mount point folder and even create a file like shown below.

/ # cd /mnt/my-nfs-share
/ # touch hola.txt
/ # ls

hola.txt

Now we are safe to jump into the manifest world to define our persistent volumes and attach them to the desired pod. First declare the PV object using ReadWriteMany as accessMode and specify the server and export path of our network file share.

Note we will use here a storageClassName specification using an arbitrary name vsan-nfs. Using a “fake” or undefined storageClass is supported by kubernetes and is tipically used for binding purposes between the PV and the PVC which is exactly our case. This is a requirement to avoid that our PV resource ends up using the default storage class which in this particular scenario would not be compatible with the ReadWriteMany access-mode required for NFS volumes.

vi nfs-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  storageClassName: vsan-nfs
  capacity:
    storage: 500Mi
  accessModes:
    - ReadWriteMany
  nfs:
    server: fs01.vsanfs.sdefinitive.net
    path: "/vsanfs/my-nfs-share"

Apply the yaml and verify the creation of the PV. Note we are using rwx mode that allow access to the same volume from different pods running in different nodes simultaneously.

kubectl get pv nfs-pv

NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
nfs-pv   500Mi      RWX            Retain           Bound    default/nfs-pvc   vsan-nfs                 60s

Now do the same for the PVC pointing to the PV created. Note we are specifiying the same storageClassName to bind the PVC with the PV. The accessMode must be also consistent with PV definition and finally, for this example we are claiming 500 Mbytes of storage.

vi nfs-pvc.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfs-pvc
spec:
  storageClassName: vsan-nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 500Mi

As usual verify the status of the pvc resource. As you can see the pvc is bound state as expected.

kubectl get pvc nfs-pvc

NAME      STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfs-pvc   Bound    nfs-pv   500Mi      RWX            vsan-fs        119s

Then attach the volume to a regular pod using following yaml manifest as shown below. This will create a basic pod that will run an alpine image that will mount the nfs pvc in the /my-nfs-share container’s local path . Ensure the highlighted claimName specification of the volume matches with the PVC name defined earlier.

vi nfs-pod1.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nfs-pod1
spec:
  containers:
  - name: alpine
    image: "alpine"
    volumeMounts:
    - name: nfs-vol
      mountPath: "/my-nfs-share"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: nfs-vol
      persistentVolumeClaim:
        claimName: nfs-pvc

Apply the yaml using kubectl apply and try to open a shell session to the container using kubectl exec as shown below.

kubectl exec nfs-pod1 -it -- sh

We should be able to access the network share, list any existing files to check if you are able to write new files as shown below.

/ # touch /my-nfs-share/hola.pod1
/ # ls /my-nfs-share
hola.pod1  hola.txt

The last test to check if actually multiple pods running in different nodes can read and write the same volume simultaneously would be creating a new pod2 that mounts the same volume. Ensure that both pods are scheduled in different nodes for a full verification of the RWX access-mode.

vi nfs-pod2.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nfs-pod2
spec:
  containers:
  - name: alpine
    image: "alpine"
    volumeMounts:
    - name: nfs-vol
      mountPath: "/my-nfs-share"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: nfs-vol
      persistentVolumeClaim:
        claimName: nfs-pvc

In the same manner apply the manifest file abouve to spin up the new pod2 and try to open a shell.

kubectl exec nfs-pod2 -it -- sh

Again, we should be able to access the network share, list existing files and also to create new files.

/ # touch /my-nfs-share/hola.pod2
/ # ls /my-nfs-share
hola.pod1  hola.pod2  hola.txt

In this article we have learnt how to enable vSAN File Services and how to consume PV in RWX. In the next post I will explain how to leverage MinIO technology to provide an S3 like object based storage on the top of vSphere for our workloads. Stay tuned!

Preparing vSphere for Kubernetes Persistent Storage (1/3): Installing vSphere CSI Driver

November 22, 2022 / jhasensio

Installing vSphere Container Storage Plugin

It is very common to relate Kubernetes with terms like stateless and ephemeral. This is because, when a container is terminated, any data stored in it during its lifetime is lost. This behaviour might be acceptable in some scenarios, however, very often, there is a need to store persistently data and some static configurations. As you can imagine data persistence is an essential feature for running stateful applications such as databases and fileservers.

Persistence enables data to be stored outside of the container and here is when persitent volumes come into play. This post will explain later what a PV is but, in a nutshell, a PV is a Kubernetes resource that allows data to be retained even if the container is terminated, and also allows the data to be accessed by multiple containers or pods if needed.

On the other hand, is important to note that, generally speaking, a Kubernetes cluster does not exist on its own but depends on some sort of underlying infrastructure, that means that it would be really nice to have some kind of connector between the kubernetes control plane and the infrastructure control plane to get the most of it. For example, this dialogue may help the kubernetes scheduler to place the pods taking into account the failure domain of the workers to achieve better availability of an application, or even better, when it comes to storage, the kubernetes cluster can ask the infrastructure to use the existing datastores to meet any persistent storage requirement. This concept of communication between a kubernetes cluster and the underlying infrastructure is referred as a Cloud Provider or Cloud Provider Interface.

In this particular scenario we are running a kubernetes cluster over the top of vSphere and we will walk through the process of setting up the vSphere Cloud Provider. Once the vSphere Cloud Provider Interface is set up you can take advantage of the Container Native Storage (CNS) vSphere built-in feature. The CNS allows the developer to consume storage from vSphere on-demand on a fully automated fashion while providing to the storage administrator visibility and management of volumes from vCenter UI. Following picture depicts a high level diagram of the integration.

It is important to note that this article is not based on any specific kubernetes distributions in particular. In the case of using Tanzu on vSphere, some of the installation procedures are not necessary as they come out of the box when enabling the vSphere with Tanzu as part of an integrated solution.

Installing vSphere Container Storage Plugin

The Container Native Storage feature is realized by means of a Container Storage Plugin, also called a CSI driver. This CSI runs in a Kubernetes cluster deployed in vSphere and will be responsilbe for provisioning persistent volumes on vSphere datastores by interacting with vSphere control plane (i.e. vCenter). The plugin supports both file and block volumes. Block volumes are typically used in more specialized cases where low-level access to the storage is required, such as for databases or other storage-intensive workloads whereas file volumes are more commonly used in kubernetes because they are more flexible and easy to manage. This guide will focus on the file volumes but feel free to explore extra volume types and supported functionality as documented here.

Before proceeding, if you want to interact with vCenter via CLI instead of using the GUI a good helper would be govc that is a tool designed to be a user friendly CLI alternative to the vCenter GUI and it is well suited for any related automation tasks. The easiest way to install it is using the govc prebuilt binaries on the releases page. The following command will install it automatically and place the binary in the /usr/local/bin path.

curl -L -o - "https://github.com/vmware/govmomi/releases/latest/download/govc_$(uname -s)_$(uname -m).tar.gz" | sudo tar -C /usr/local/bin -xvzf - govc

To facilitate the use of govc, we can create a file to set some environment variables to avoid having to enter the URL and credentials each time. A good practice is to obfuscate the credentials using a basic Base64 encoding algorithm. Following command show how to code any string using this mechanism.

echo “passw0rd” | base64

cGFzc3cwcmQK

Get the Base64 encoded string of your username and password as shown above and now edit a file named govc.env and set the following environment variables replacing with your particular data.

vi govc.env

export GOVC_URL=vcsa.cpod-vcn.az-mad.cloud-garage.net
export GOVC_USERNAME=$(echo <yourbase64encodedusername> | base64 -d)
export GOVC_PASSWORD=$(echo <yourbase64encodedpasssord> | base64 -d)
export GOVC_INSECURE=1

Once the file is created you can actually set the variables using source command.

source govc.env

If everything is ok you can should be able to use govc command without any further parameters. As an example, try a simple task such as browsing your inventory to check if you can access to your vCenter and authentication has succeded.

govc ls

/cPod-VCN/vm
/cPod-VCN/network
/cPod-VCN/host
/cPod-VCN/datastore

Step 1: Prepare vSphere Environment

According to the deployment document mentioned earlier, one of the requirements is enabling the UUID advanced property in all Virtual Machines that conform the cluster that is going to consume the vSphere storage through the CSI.

Since we already have the govc tool installed and operational we can take advantage of it to do it programmatically instead of using the vsphere graphical interface which is always more laborious and costly in time, especially if the number of nodes in our cluster is very high. The syntax to enable the mentioned advanced property is shown below.

govc vm.change -vm 'vm_inventory_path' -e="disk.enableUUID=1"

Using ls command and pointing to the right folder, we can see the name of the VMs that have been placed in the folder of interest. In my setup the VMs are placed under cPod-VCN/vm/k8s folder as you can see in the following output.

govc ls /cPod-VCN/vm/k8s

/cPod-VCN/vm/k8s/k8s-worker-06
/cPod-VCN/vm/k8s/k8s-worker-05
/cPod-VCN/vm/k8s/k8s-worker-03
/cPod-VCN/vm/k8s/k8s-control-plane-01
/cPod-VCN/vm/k8s/k8s-worker-04
/cPod-VCN/vm/k8s/k8s-worker-02
/cPod-VCN/vm/k8s/k8s-worker-01

Now that we know the VMs that conform our k8s cluster you can issue the following command to set the disk-enableUUID VM property one by one. Another smarter approach (specially if the number of worker nodes is high of if you need to automate this task) is taking advantage of some linux helpers to create “single line commands”. See below how you can do it chaining the govc output along with the powerful xargs command to easily issue the same command recursively for all ocurrences.

govc ls /cPod-VCN/vm/k8s | xargs -n1 -I{arg} govc vm.change -vm {arg} -e="disk.enableUUID=1"

This should enable the UUID advanced parameter in all the listed vms and we should be ready to take next step.

Step 2: Install Cloud Provider Interface

Once this vSphere related tasks has been completed, we can move to Kubernetes to install the Cloud Provider Interface. First of all, is worth to mention that the vSphere cloud-controller-manager (the element in charge of installing the required components that conforms the Cloud Provider) relies the well-known kubernetes taint node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule to mark the kubelet as not initialized before proceeding with cloud provider installation. Generally speaking a taint is just a node’s property in a form of a label that is typically used to ensure that nodes are properly configured before they are added to the cluster, and to prevent issues caused by nodes that are not yet ready to operate normally. Once the node is fully initialized, the label can be removed to restoring normal operation. The procedure to taint all the nodes of your cluster in a row, using a single command is shown below.

kubectl get nodes | grep Ready | awk ‘{print $1}’ | xargs -n1 -I{arg} kubectl taint node {arg} node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

node/k8s-contol-plane-01 tainted
node/k8s-worker-01 tainted
node/k8s-worker-02 tainted
node/k8s-worker-03 tainted
node/k8s-worker-04 tainted
node/k8s-worker-05 tainted
node/k8s-worker-06 tainted

Once the cloud-controller-manager initializes this node, the kubelet removes this taint. Verify the taints are configured by using regular kubectl commads and some of the parsing and filtering capabilities that jq provides as showed below.

kubectl get nodes -o json | jq ‘[.items[] | {name: .metadata.name, taints: .spec.taints}]’

{
    "name": "k8s-worker-01",
    "taints": [
      {
        "effect": "NoSchedule",
        "key": "node.cloudprovider.kubernetes.io/uninitialized",
        "value": "true"
      }
    ]
  },
<skipped...>

Once the nodes are properly tainted we can install the vSphere cloud-controller-manager. Note CPI is tied to the specific kubernetes version we are running. In this particular case I am running k8s version 1.24. Get the corresponding manifest from the official cloud-provider-vsphere github repository using below commands.

VERSION=1.24
wget https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/master/releases/v$VERSION/vsphere-cloud-controller-manager.yaml

Now edit the downloaded yaml file and locate the section where a Secret object named vsphere-cloud-secret is declared. Change the highlighted lines to match your environment settings. Given the fact this intents to be a lab environment and for the sake of simplicity, I am using a full-rights administrator account for this purpose. Make sure you should follow best practiques and create minimum privileged service accounts if you plan to use it in a production environment. Find here the full procedure to set up specific roles and permissions.

vi vsphere-cloud-controller-manager.yaml (Secret)

apiVersion: v1
kind: Secret
metadata:
  name: vsphere-cloud-secret
  labels:
    vsphere-cpi-infra: secret
    component: cloud-controller-manager
  namespace: kube-system
  # NOTE: this is just an example configuration, update with real values based on your environment
stringData:
  172.25.3.3.username: "[email protected]"
  172.25.3.3.password: "<useyourpasswordhere>"

In the same way, locate a ConfigMap object called vsphere-cloud-config and change relevant settings to match your environment as showed below:

vi vsphere-cloud-controller-manager.yaml (ConfigMap)

apiVersion: v1
kind: ConfigMap
metadata:
  name: vsphere-cloud-config
  labels:
    vsphere-cpi-infra: config
    component: cloud-controller-manager
  namespace: kube-system
data:
  # NOTE: this is just an example configuration, update with real values based on your environment
  vsphere.conf: |
    # Global properties in this section will be used for all specified vCenters unless overriden in VirtualCenter section.
    global:
      port: 443
      # set insecureFlag to true if the vCenter uses a self-signed cert
      insecureFlag: true
      # settings for using k8s secret
      secretName: vsphere-cloud-secret
      secretNamespace: kube-system

    # vcenter section
    vcenter:
      vcsa:
        server: 172.25.3.3
        user: "[email protected]"
        password: "<useyourpasswordhere>"
        datacenters:
          - cPod-VCN

Now that our configuration is completed we are ready to install the controller that will be in charge of establishing the communication between our vSphere based infrastructure and our kubernetes cluster.

kubectl apply -f vsphere-cloud-controller-manager.yaml

serviceaccount/cloud-controller-manager created 
secret/vsphere-cloud-secret created 
configmap/vsphere-cloud-config created 
rolebinding.rbac.authorization.k8s.io/servicecatalog.k8s.io:apiserver-authentication-reader created 
clusterrolebinding.rbac.authorization.k8s.io/system:cloud-controller-manager created
clusterrole.rbac.authorization.k8s.io/system:cloud-controller-manager created 
daemonset.apps/vsphere-cloud-controller-manager created

If everything goes as expected, we should now see a new pod running in the kube-system namespace. Verify the running state just by showing the created pod using kubectl as shown below.

kubectl get pod -n kube-system vsphere-cloud-controller-manager-wtrjn -o wide

NAME                                     READY   STATUS    RESTARTS        AGE     IP            NODE                  NOMINATED NODE   READINESS GATES
vsphere-cloud-controller-manager-wtrjn   1/1     Running   1 (5s ago)   5s   10.113.2.10   k8s-contol-plane-01   <none>           <none>

Step 3: Installing Container Storage Interface (CSI Driver)

Before moving further, it is important to establish the basic kubernetes terms related to storage. The following list summarizes the main resources kubernetes uses for this specific purpose.

Persistent Volume: A PV is a kubernetes object used to provision persistent storage for a pod in the form of volumes. The PV can be provisioned manually by an administrator and backed by physical storage in a variety of formats such as local storage on the host running the pod or external storage such as NFS, or it can also be dinamically provisioned interacting with an storage provider through the use of a CSI (Compute Storage Interface) Driver.
Persistent Volume Claim: A PVC is the developer’s way of defining a storage request. Just as the definition of a pod involves a computational request in terms of cpu and memory, a pvc will be related to storage parameters such as the size of the volume, the type of data access or the storage technology used.
Storage Class: a StorageClass is another Kubernetes resource related to storage that allows you to point a storage resource whose configuration has been defined previously using a class created by the storage administrator. Each class can be related to a particular CSI driver and have a configuration profile associated with it such as class of service, deletion policy or retention.

To sum up, in general, to have persistence in kubernetes you need to create a PVC which later will be consumed by a pod. The PVC is just a request for storage bound to a particular PV, however, if your define a Storage Class, you don’t have to worry about PV provisioning, the StorageClass will create the PV on the fly on your behalf interacting via API with the storage infrastructure.

In the particular case of the vSphere CSI Driver, when a PVC requests storage, the driver will translate the instructions declared in the Kubernetes object into a API request that vCenter will be able to understand. vCenter will then instruct the creation of vSphere cloud native storage (i.e a PV in a form of a native vsphere vmdk) that will be attached to the VM running the Kubernetes node and then attached to the pod itself. One extra benefit is that vCenter will report information about the container volumes in the vSphere client to allow the administrator to have an integrated storage management view.

Let’s deploy the CSI driver then. The first step is to create a new namespace that we will use to place the CSI related objects. To do this we use kubectl as showed below:

kubectl create ns vmware-system-csi

Now create a config file that will be used to authenticate the cluster against vCenter. As mentioned we are using here a full-rights administrator account but it is recommended to use a service account with specific associated roles and permissions. Also, for the sake of simplicity, I am not verifying vCenter SSL presented certificate but it is strongly recommended to import vcenter certificates to enhance communications security. Replace the highligthed lines to match with your own environment as shown below.

vi csi-vsphere.conf

[Global]
cluster-id = "cluster01"
cluster-distribution = "Vanilla"
# ca-file = <ca file path> # optional, use with insecure-flag set to false
# thumbprint = "<cert thumbprint>" # optional, use with insecure-flag set to false without providing ca-file
[VirtualCenter "vcsa.cpod-vcn.az-mad.cloud-garage.net"]
insecure-flag = "true"
user = "[email protected]"
password = "<useyourpasswordhere>"
port = "443"
datacenters = "<useyourvsphereDChere>"

In order to inject the configuration and credential information into kubernetes we will use a secret object that will use the config file as source. Use following kubectl command to proceed.

kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf -n vmware-system-csi

And now it is time to install the driver itself. As usual we will use a manifest that will install the latest version available that at the moment of writing this post is 2.7.

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.7.0/manifests/vanilla/vsphere-csi-driver.yaml

If you inspect the state of the installed driver you will see that two replicas of the vsphere-csi-controller deployment remain in a pending state. This is because the deployment by default is set to spin up 3 replicas but also has a policy to be scheduled only in control plane nodes along with an antiaffinity rule to avoid two pods running on the same node. That means that with a single control plane node the maximum number of replicas in running state would be one. On the other side a daemonSet will also spin a vsphere-csi-node in every single node.

kubectl get pod -n vmware-system-csi

NAME                                      READY   STATUS    RESTARTS        AGE
vsphere-csi-controller-7589ccbcf8-4k55c   0/7     Pending   0               2m25s
vsphere-csi-controller-7589ccbcf8-kbc27   0/7     Pending   0               2m27s
vsphere-csi-controller-7589ccbcf8-vc5d5   7/7     Running   0               4m13s
vsphere-csi-node-42d8j                    3/3     Running   2 (3m25s ago)   4m13s
vsphere-csi-node-9npz4                    3/3     Running   2 (3m28s ago)   4m13s
vsphere-csi-node-kwnzs                    3/3     Running   2 (3m24s ago)   4m13s
vsphere-csi-node-mb4ss                    3/3     Running   2 (3m26s ago)   4m13s
vsphere-csi-node-qptpc                    3/3     Running   2 (3m24s ago)   4m13s
vsphere-csi-node-sclts                    3/3     Running   2 (3m22s ago)   4m13s
vsphere-csi-node-xzglp                    3/3     Running   2 (3m27s ago)   4m13s

You can easily adjust the number of replicas in the vsphere-csi-controller deployment that just by editing the kubernetes resource and set the number of replicas to one. The easiest way to do it is shown below.

kubectl scale deployment -n vmware-system-csi vsphere-csi-controller --replicas=1

Step 4: Creating StorageClass and testing persistent storage

Now that our CSI driver is up and running let’s create a storageClass that will point to the infrastructure provisioner to create the PVs for us on-demand. Before proceeding with storageClass definition lets take a look at the current datastore related information in our particular scenario. We can use the vSphere GUI for this but again, an smarter way is using govc to obtain some relevant information of our datastores that we will use afterwards.

govc datastore.info

Name:        vsanDatastore
  Path:      /cPod-VCN/datastore/vsanDatastore
  Type:      vsan
  URL:       ds:///vmfs/volumes/vsan:529c9fd4d68b174b-1af2d7a4b1b22457
  Capacity:  4095.9 GB
  Free:      3328.0 GB
Name:        nfsDatastore
  Path:      /cPod-VCN/datastore/nfsDatastore
  Type:      NFS
  URL:       ds:///vmfs/volumes/f153c0aa-c96d23c2/
  Capacity:  1505.0 GB
  Free:      1494.6 GB
  Remote:    172.25.3.1:/data/Datastore

We want our volumes to use the vSAN storage as our persistent storage. To do so, use the vsanDataStore associated URL to instruct the CSI to create the persistent volumes in the desired datastore. You can create as many storageClasses as required, each of them with particular parametrization such as the datastore backend, the storage policy or the filesystem type. Additionally, as part of the definition of our Storage class, we are adding an annotation to declare this class as default. That means any PVC without an explicit storageClass specification will use this one as default.

vi vsphere-sc.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: vsphere-sc
  annotations: storageclass.kubernetes.io/is-default-class=true
provisioner: csi.vsphere.vmware.com
parameters:
  storagepolicyname: "vSAN Default Storage Policy"  
  datastoreurl: "ds:///vmfs/volumes/vsan:529c9fd4d68b174b-1af2d7a4b1b22457/"
# csi.storage.k8s.io/fstype: "ext4" #Optional Parameter

Once the yaml manifest is created simply apply it using kubectl.

kubectl apply -f vsphere-sc.yaml

As a best practique, always verify the status of any created object to see if everything is correct. Ensure the StorageClass is followed by “(default)” which means the annotation has been correctly applied and this storageClass will be used by default.

kubectl get storageclass

NAME                        PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
vsphere-sc (default)        csi.vsphere.vmware.com   Delete          Immediate              false                  10d

As mentioned above the StorageClass allows us to abstract from the storage provider so that the developer can dynamically request a volume without the intervention of an storage administrator. The following manifest would allow you to create a volume using the newly created vsphere-cs class.

vi vsphere-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vsphere-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: vsphere-sc

Apply the created yaml file using kubectl command as shown below

kubectl apply -f vsphere-pvc.yaml

And verify the creation of the PVC and the current status using kubectl get pvc command line.

kubectl get pvc

NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
vsphere-pvc   Bound    pvc-8da817f0-8231-4d27-9452-188a8ef4f144   5Gi        RWO            vsphere-sc     13s

Note how the new PVC is bound to a volume that has been automatically created by means of the CSI Driver without any user intervention. If you explore the PV using kubectl you would see.

kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                 STORAGECLASS   REASON   AGE
pvc-8da817f0-8231-4d27-9452-188a8ef4f144   5Gi        RWO            Delete           Bound    default/vsphere-pvc   vsphere-sc              37s

When using vSphere CSI Driver another cool benefit is that you have integrated management so you can access the vsphere console to verify the creation of the volume according to the capacity parameters and access policy configured as shown in the figure below

You can drill if you go to Cluster > Monitor > vSAN Virtual Objects. Filter out using the volume name assigned to the PVC to get a cleaner view of your interesting objects.

Now click on VIEW PLACEMENT DETAILS to see the Physical Placement for the persistent volume we have just created.

The Physical Placement window shows how the data is actually placed in vmdks that resides in different hosts (esx03 and esx04) of the vSAN enabled cluster creating a RAID 1 strategy for data redundancy that uses a third element as a witness placed in esx01. This policy is dictated by the “VSAN Default Storage Policy” that we have attached to our just created StorageClass. Feel free to try different StorageClasses bound to different vSAN Network policies to fit your requirements in terms of availability, space optimization, encryption and so on.

This concludes the article. In the next post I will explain how to enable vSAN File Services in kubernetes to cover a particular case in which many different pods running in different nodes need to access the same volume simultaneously. Stay tuned!