by jhasensio

Month: December 2022

Preparing vSphere for Kubernetes Persistent Storage (3/3): MinIO S3 Object Storage using vSAN SNA

Modern Kubernetes-based applications are built with the idea of being able to offer features such as availability, scalability and replication natively and agnosticly to the insfrasturure they are running on. This philosophy questions the need to duplicate those same functions from the infrastructure, especially at the storage layer. As an example, we could set up a modern storage application like MinIO to provide an S3 object storage platform with its own protection mechanisms (e.g., erasure code) using vSAN as underlay storage infrastructure that in turn have some embedded data redundancy and availability mechanisms.

The good news is that when using vSAN, we can selectively choose which features we want to enable in our storage layer through Storage Policy-Based Management (SPBM). A special case is the so-called vSAN SNA (Shared-Nothing Architecture) that basically consists of disabling any redundancy in the vSAN-based volumes and rely on the application itself (e.g. MinIO) to meet any data redundany and fault tolerance requirements.

In this post we will go through the setup of MinIO over the top of vSAN using a Shared-Nothing Architecture.

Defining vSAN SNA (Shared-Nothing Architecture)

As mentioned, VMware vSAN provides a lot of granularity to tailor the storage service through SPBM policies. These policies can control where tha data is going to be physically placed, how the data is going to be protected in case a failure occurs, what is the performance in term of IOPS and so on in order to guarantee the required level of service to meet your use case requirements.

The setup of the vSAN cluster is out of scope of this walk through so, assuming vSAN cluster is configured and working properly, open the vCenter GUI and go to Policies and Profiles > VM Storage Policies and click on CREATE to start the definition of a new Storage Policy.

vSAN Storage Policy Creation (1)

Pick up a significative name. I am using here vSAN-SNA MINIO as you can see below

vSAN Storage Policy Creation (2)

Now select the Enable rules for “vSAN” storage that is exactly what we are trying to configure.

vSAN Storage Policy Creation (3)

Next, configure the availability options. Ensure you select “No data redundancy with host affinity” here that is the main configuration setting if you want to create a SNA architecture in which all the data protection and availabilty will rely on the upper storage application (MinIO in this case).

vSAN Storage Policy Creation (4)

Select the Storage Rules as per your preferences. Remember we are trying to avoid overlapping features to gain in performance and space efficiency, so ensure you are not duplication any task that MinIO is able to provide including data encryption.

vSAN Storage Policy Creation (5)

Next select the Object space reservation and other settings. I am using here Thin provisioning and default values for the rest of the settings.

vSAN Storage Policy Creation (6)

Select any compatible vsanDatastore that you have in your vSphere infrastructure.

vSAN Storage Policy Creation (6)

After finishing the creation of the Storage Policy, it is time to define a StorageClass attached to the created vSAN policy. This configuration assumes that you have your kubernetes cluster already integrated with vSphere CNS services using the CSI Driver. If this is not the case you can follow this previous post before proceeding. Create a yaml file with following content.

vi vsphere-sc-sna.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: vsphere-sc-sna
provisioner: csi.vsphere.vmware.com
parameters:
  datastoreurl: "ds:///vmfs/volumes/vsan:529c9fd4d68b174b-1af2d7a4b1b22457/"
  storagepolicyname: "vSAN-SNA MINIO"
# csi.storage.k8s.io/fstype: "ext4" #Optional Parameter

Once the yaml manifest is created simply apply it using kubectl.

kubectl apply -f vsphere-sc-sna.yaml 

Now create a new PVC that uses the storageclass defined. Remember that in general, when using an storageClass there is no need to precreate the PVs because the CSI Driver will provision for you dinamically without any storage administrator preprovisioning.

vi pvc_sna.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vsphere-pvc-sna
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi
  storageClassName: vsphere-sc-sna

Capture the allocated name pvc-9946244c-8d99-467e-b444-966c392d3bfa for the new created PVC.

kubectl get pvc vsphere-pvc-sna
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
vsphere-pvc-sna   Bound    pvc-9946244c-8d99-467e-b444-966c392d3bfa   500Mi      RWO            vsphere-sc-sna   14s

Access to Cluster > Monitor > vSAN Virtual Objects and filter out using the previously captured pvc name in order to view the physical placement that is being used.

PVC virtual object

The redundancy scheme here is RAID 0 with a single vmdk disk which means in practice that vSAN is not providing any level of protection or performance gaining which is exactly what we are trying to achieve in this particular Shared-Nothing Architecture.

PVC Physical Placement

Once the SNA architecture is defined we can proceed with MinIO Operator installation.

Installing MinIO Operator using Krew

MinIO is an object storage solution with S3 API support and is a popular alternative for providing an Amazon S3-like service in a private cloud environment. In this example we will install MinIO Operator that allows us to deploy and manage different MinIO Tenants in the same cluster. According to the kubernetes official documentation, a kubernetes operator is just a software extension to Kubernetes that make use of custom resources to manage applications and their components following kubernetes principles.

Firstly, we will install a MinIO plugin using the krew utility. Krew is a plugin manager that complements kubernetes CLI tooling and is very helpful for installing and updating plugins that act as add-ons over kubectl command .

To install krew just copy and paste the following command line. Basically that set of commands check the OS to pick the corresponding krew prebuilt binary and then downloads, extracts and finally installs krew utility.

(
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)

The binary will be installed in the $HOME/.krew/bin directory. Update your PATH environment variable to include this new directory. In my case I am using bash.

echo 'export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"' >> $HOME/.bashrc

Now make the changes effective by restarting your shell. The easiest way is using exec to reinitialize the existing session and force to read new values.

exec $SHELL

If everything is ok, we should be able to list the installed krew plugins. The only available plugin by default is the krew plugin itself.

kubectl krew list
PLUGIN    VERSION
krew      v0.4.3

Now we should be able to install the desired MinIO plugin just using kubectl krew install as shown below.

kubectl krew install minio
Updated the local copy of plugin index.
Installing plugin: minio
Installed plugin: minio
\
 | Use this plugin:
 |      kubectl minio
 | Documentation:
 |      https://github.com/minio/operator/tree/master/kubectl-minio
 | Caveats:
 | \
 |  | * For resources that are not in default namespace, currently you must
 |  |   specify -n/--namespace explicitly (the current namespace setting is not
 |  |   yet used).
 | /
/
WARNING: You installed plugin "minio" from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.

Once the plugin is installed we can initialize the MinIO operator just by using the kubectl minio init command as shown below.

kubectl minio init
namespace/minio-operator created
serviceaccount/minio-operator created
clusterrole.rbac.authorization.k8s.io/minio-operator-role created
clusterrolebinding.rbac.authorization.k8s.io/minio-operator-binding created
customresourcedefinition.apiextensions.k8s.io/tenants.minio.min.io created
service/operator created
deployment.apps/minio-operator created
serviceaccount/console-sa created
secret/console-sa-secret created
clusterrole.rbac.authorization.k8s.io/console-sa-role created
clusterrolebinding.rbac.authorization.k8s.io/console-sa-binding created
configmap/console-env created
service/console created
deployment.apps/console created
-----------------

To open Operator UI, start a port forward using this command:

kubectl minio proxy -n minio-operator

-----------------

The MinIO Operator installation has created several kubernetes resources within the minio-operator namespace, we can explore the created services.

kubectl get svc -n minio-operator
NAME       TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)             AGE
console    ClusterIP      10.110.251.111   <none>         9090/TCP,9443/TCP   34s
operator   ClusterIP      10.111.75.168    <none>         4222/TCP,4221/TCP   34s

A good idea to ease the console access instead of using port-forwarding as suggested in the output of the MinIO Operator installation is exposing the operator console externally using any of the well known kubernetes methods. As an example, the following manifest creates a LoadBalancer object that will provide external reachability to the MinIO Operator console web site.

vi minio-console-lb.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: minio-operator
  name: minio
  namespace: minio-operator
spec:
  ports:
  - name: http
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: console
  type: LoadBalancer

Apply above yaml file using kubectl apply and check if your Load Balancer component is reporting the allocated External IP address. I am using AVI with its operator (AKO) to capture the LoadBalancer objects and program the external LoadBalancer but feel free to use any other ingress solution of your choice.

kubectl get svc minio -n minio-operator
NAME    TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)          AGE
minio   LoadBalancer   10.97.25.202   10.113.3.101   9090:32282/TCP   3m

Before trying to access the console, you need to get the token that will be used to authenticate the access to the MinIO Operator GUI.

kubectl get secret console-sa-secret -n minio-operator -o jsonpath=”{.data.token}” | base64 -d ; echo
eyJhbGciOi <redacted...> 

Now you can easily access to the external IP address that listens in the port 9090 and you should reach the MinIO Operator web that looks like the image below. Use the token as credential to access the console.

MinIO Console Access

The first step is to create a tenant. A tenant, as the wizard states, is just the logical structure to represent a MinIO deployment. A tenant can have different size and configurations from other tenants, even a different storage class. Click on the CREATE TENANT button to start the process.

MinIO Operator Tenant Creation

We will use the name archive, and will place the tenant in the namespace of the same name that we can create from the console itself if it does not exist previously in our target cluster. Make sure you select StorageClass vsphere-sc-sna that has been previously created for this specific purpose.

In the Capacity Section we can select the number of servers, namely the kubernetes nodes that will run a pod, and the number of drives per server, namely the persistent volumes mounted in each pod, that will architecture this software defined storage layer. According to the settings shown below, is easy to do the math to calculate that achieving a Size of 200 Gbytes using 6 servers with 4 drives each you would need 200/(6×4)=8,33 G per drive.

The Erasure Code Parity will set the level of redundancy and availability we need and it’s directly related with overall usable capacity. As you can guess, the more the protection, the more the wasting of storage. In this case, the selected EC:2 will tolerate a single server failure and the usable capacity would be 166,7 Gb. Feel free to change the setting and see how the calculation is refreshed in the table at the right of the window.

MinIO Tenant Creation Dialogue (Settings)

The last block of this configuration section allows you to the assigned resources in terms of CPU and memory that each pod will request to run.

MinIO Tenant Creation Dialogue (Settings 2)

Click on Configure button to define whether we want to expose the MinIO services (S3 endpoint and tenant Console) to provide outside access.

MinIO Tenant Creation Dialogue (Services)

Another important configuration decision is whether we want to use TLS or not to access the S3 API endpoint. By default the tenant and associated buckets uses TLS and autogenerated certs for this purpose.

MinIO Tenant Creation Dialogue (Security)

Other interesting setting you can optionally enable is the Audit Log to store your log transactions into a database. This can be useful for troubleshooting and security purposes. By default the Audit Log is enabled.

The monitoring section will provide allow you to get metrics and dashboards of your tenant by deploying a Prometheus Server to scrape some relevant metrics associated with your service. By default the service is also enabled.

Feel free to explore the rest of settings to change other advanced parameters such as encryption (disabled by default) or authentication (by default using local credentials but integrable with external authentication systems). As soon as you click on the Create button at the bottom right corner you will launch the new tenant and a new window will appear with a set of the credentials associated with the new tenant. Save it in a JSON file or write it down for later usage because there is no way to display it afterwards.

New Tenant Credentials

The aspect of the json file is you decide to download it is shown below.

{
  "url":"https://minio.archive.svc.cluster.local:443",
  "accessKey":"Q78mz0vRlnk6melG",
  "secretKey":"6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp",
  "api":"s3v4",
  "path":"auto"
}

We are done with our first MinIO tenant creation process. Let’s move into the next part to inspect the created objects.

Inspecting Minio Operator kubernetes resources

Once the tenant is created the Minio Operator Console will show it up with real-time information. There are some task to be done under the hood to complete process such as deploying pods, pulling container images, creating pvcs and so on so it will take some time to have it ready. The red circle and the absence of statistics indicates the tenant creation is still in process.

Tenant Administration

After couple of minutes, if you click on the tenant Tile you will land in the tenant administration page from which you will get some information about the current state, the Console and Endpoint URLs, number of drives and so on.

If you click on the YAML button at the top you will see the YAML file. Although the GUI can be useful to take the first steps in the creation of a tenant, when you are planning to do it in an automated fashion and the best approach is to leverage the usage of yaml files to declare the tenant object that is basically a CRD object that the MinIO operator watches to reconcile the desired state of the tenants in the cluster.

In the Configuration section you can get the user (MINIO_ROOT_USER) and password (MINIO_ROOT_PASSWORD). Those credentials can be used to access the tenant console using the corresponding endpoint.

The external URL https://archive-console.archive.avi.sdefinitive.net:9443 can be used to reach our archive tenant console in a separate GUI like the one shown below. Another option available from the Minio Operator GUI is using the Console button at the top. Using this last method will bypass the authentication.

If you click on the Metrics you can see some interesting stats related to the archive tenant as shown below.

Move into kubernetes to check the created pods. As expected a set of 6 pods are running and acting as the storage servers of our tenant. Aditionally there is other complementary pods also running in the namespace for monitoring and logging

kubectl get pod -n archive
NAME                                      READY   STATUS    RESTARTS        AGE
archive-log-0                             1/1     Running   0               5m34s
archive-log-search-api-5568bc5dcb-hpqrw   1/1     Running   3 (5m17s ago)   5m32s
archive-pool-0-0                          1/1     Running   0               5m34s
archive-pool-0-1                          1/1     Running   0               5m34s
archive-pool-0-2                          1/1     Running   0               5m34s
archive-pool-0-3                          1/1     Running   0               5m33s
archive-pool-0-4                          1/1     Running   0               5m33s
archive-pool-0-5                          1/1     Running   0               5m33s
archive-prometheus-0                      2/2     Running   0               2m33s

If you check the volume of one of the pod you can tell how each server (pod) is mounting four volumes as specified upon tenant creation.

kubectl get pod -n archive archive-pool-0-0 -o=jsonpath=”{.spec.containers[*].volumeMounts}” | jq
[
  {
    "mountPath": "/export0",
    "name": "data0"
  },
  {
    "mountPath": "/export1",
    "name": "data1"
  },
  {
    "mountPath": "/export2",
    "name": "data2"
  },
  {
    "mountPath": "/export3",
    "name": "data3"
  },
  {
    "mountPath": "/tmp/certs",
    "name": "archive-tls"
  },
  {
    "mountPath": "/tmp/minio-config",
    "name": "configuration"
  },
  {
    "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
    "name": "kube-api-access-xdftb",
    "readOnly": true
  }
]

Correspondingly each of the volumes should be associated to a PVC bounded to a PV that has been dynamically created at the infrastructre storage layer through the storageClass. If you remember the calculation we did above, the size that each pvc should be 200/ (6 servers x 4) = 8,33 Gi that is aprox the capacity (8534Mi) of the 24 PVCs displayed below.

kubectl get pvc -n archive
NAME                                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
archive-log-archive-log-0                 Bound    pvc-3b3b0738-d4f1-4611-a059-975dc44823ef   5Gi        RWO            vsphere-sc       33m
archive-prometheus-archive-prometheus-0   Bound    pvc-c321dc0e-1789-4139-be52-ec0dbc25211a   5Gi        RWO            vsphere-sc       30m
data0-archive-pool-0-0                    Bound    pvc-5ba0d4b5-e119-49b2-9288-c5556e86cdc1   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-1                    Bound    pvc-ce648e61-d370-4725-abe3-6b374346f6bb   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-2                    Bound    pvc-4a7198ce-1efd-4f31-98ed-fc5d36ebb06b   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-3                    Bound    pvc-26567625-982f-4604-9035-5840547071ea   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-4                    Bound    pvc-c58e8344-4462-449f-a6ec-7ece987e0b67   8534Mi     RWO            vsphere-sc-sna   33m
data0-archive-pool-0-5                    Bound    pvc-4e22d186-0618-417b-91b8-86520e37b3d2   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-0                    Bound    pvc-bf497569-ee1a-4ece-bb2d-50bf77b27a71   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-1                    Bound    pvc-0e2b1057-eda7-4b80-be89-7c256fdc3adc   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-2                    Bound    pvc-f4d1d0ff-e8ed-4c6b-a053-086b7a1a049b   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-3                    Bound    pvc-f8332466-bd49-4fc8-9e2e-3168c307d8db   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-4                    Bound    pvc-0eeae90d-46e7-4dba-9645-b2313cd382c9   8534Mi     RWO            vsphere-sc-sna   33m
data1-archive-pool-0-5                    Bound    pvc-b7d7d1c1-ec4c-42ba-b925-ce02d56dffb0   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-0                    Bound    pvc-08642549-d911-4384-9c20-f0e0ab4be058   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-1                    Bound    pvc-638f9310-ebf9-4784-be87-186ea1837710   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-2                    Bound    pvc-aee4d1c0-13f7-4d98-8f83-c047771c4576   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-3                    Bound    pvc-06cd11d9-96ed-4ae7-a9bc-3b0c52dfa884   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-4                    Bound    pvc-e55cc7aa-8a3c-463e-916a-8c1bfc886b99   8534Mi     RWO            vsphere-sc-sna   33m
data2-archive-pool-0-5                    Bound    pvc-64948a13-bdd3-4d8c-93ad-155f9049eb36   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-0                    Bound    pvc-565ecc61-2b69-45ce-9d8b-9abbaf24b829   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-1                    Bound    pvc-c61d45da-d7da-4675-aafc-c165f1d70612   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-2                    Bound    pvc-c941295e-3e3d-425c-a2f0-70ee1948b2f0   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-3                    Bound    pvc-7d7ce3b1-cfeb-41c8-9996-ff2c3a7578cf   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-4                    Bound    pvc-c36150b1-e404-4ec1-ae61-6ecf58f055e1   8534Mi     RWO            vsphere-sc-sna   33m
data3-archive-pool-0-5                    Bound    pvc-cd847c84-b5e1-4fa4-a7e9-a538f9424dbf   8534Mi     RWO            vsphere-sc-sna   33m

Everything looks good so let’s move into the tenant console to create a new bucket.

Creating a S3 Bucket

Once the tenant is ready, you can use S3 API to create buckets and to push data into them. When using MinIO Operator setup you can also use the Tenant GUI as shown below. Access the tenant console using the external URL or simply jump from the Operator Console GUI and you will reach the following page.

A bucket in S3 Object Storage is similar to a Folder in a traditional filesystem and its used to organice the pieces of data (objects). Create a new bucket with the name my-bucket.

Accessing S3 using MinIO Client (mc)

Now lets move into the CLI to check how can we interact via API with the tenant through the minio client (mc) tool. To install it just issue below commands.

curl https://dl.min.io/client/mc/release/linux-amd64/mc \
  --create-dirs \
  -o $HOME/minio-binaries/mc

chmod +x $HOME/minio-binaries/mc
export PATH=$PATH:$HOME/minio-binaries/

As a first step declare a new alias with the definition of our tenant. Use the tenant endpoint and provide the accesskey and the secretkey you capture at the time of tenant creation. Remember we are using TLS and self-signed certificates so the insecure flag is required.

mc alias set minio-archive https://minio.archive.avi.sdefinitive.net Q78mz0vRlnk6melG 6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp –insecure
Added `minio-archive` successfully.

The info keyword can be used to show relevant information of our tenant such as the servers (pods) and the status of the drives (pvc) and the network. Each of the servers will listen in the port 9000 to expose the storage access externally.

mc admin info minio-archive –insecure
●  archive-pool-0-0.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-1.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-2.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-3.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-4.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

●  archive-pool-0-5.archive-hl.archive.svc.cluster.local:9000
   Uptime: 52 minutes 
   Version: 2023-01-02T09:40:09Z
   Network: 6/6 OK 
   Drives: 4/4 OK 
   Pool: 1

Pools:
   1st, Erasure sets: 2, Drives per erasure set: 12

0 B Used, 1 Bucket, 0 Objects
24 drives online, 0 drives offline

You can list the existing buckets using mc ls command as shown below.

mc ls minio-archive –insecure
[2023-01-04 13:56:39 CET]     0B my-bucket/

As a quick test to check if we can able to write a file into the S3 bucket, following command will create a dummy 10Gb file using fallocate.

fallocate -l 10G /tmp/10Gb.file

Push the 10G file into S3 using following mc cp command

mc cp /tmp/10Gb.file minio-archive/my-bucket –insecure
/tmp/10Gb.file:                        10.00 GiB / 10.00 GiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.52 MiB/s 3m18s

While the transfer is still running you can verify at the tenant console that some stats are being populated in the metrics section.

Additionally you can also display some metrics dashboards in the Traffic tab.

Once the transfer is completed, if you list again to see the contents of the bucket you should see something like this.

mc ls minio-archive/my-bucket –insecure
[2023-01-04 17:20:14 CET]  10GiB STANDARD 10Gb.file

Consuming Minio S3 backend from Kubernetes Pods

Now that we know how to install the MinIO Operator, how to create a Tenant and how to create a bucket it is time to understand how to consume the MinIO S3 backend from a regular Pod. The access to the data in a S3 bucket is done generally through API calls so we need a method to create an abstraction layer that allow the data to be mounted in the OS filesystem as a regular folder or drive in a similar way to a NFS share.

This abstraction layer in Linux is implemented by FUSE (Filesystem in Userspace) which, in a nutshell is a user-space program able to mount a file system that appears to the operating system as if it were a regular folder. There is an special project called s3fs-fuse that allow precisely a Linux OS to mount a S3 bucket via FUSE while preserving the native object format.

As a container is essentially a Linux OS, we just need to figure out how to take advantage of this and use it in a kubernetes environment.

Option 1. Using a custom image

The first approach consists on creating a container to access the remote minio bucket presenting it to the container image as a regular filesystem. The following diagram depicts a high level overview of the intended usage.

S3 Mounter Container

As you can guess s3fs-fuse is not part of any core linux distribution so the first step is to create a custom image containing the required software that will allow our container to mount the S3 bucket as a regular file system. Let’s start by creating a Dockerfile that will be used as a template to create the custom image. We need to specify also the command we want to run when the container is spinned up. Find below the Dockerfile to create the custom image with some explanatory comments.

vi Dockerfile
# S3 MinIO mounter sample file 
# 
# Use ubuntu 22.04 as base image
FROM ubuntu:22.04

# MinIO S3 bucket will be mounted on /var/s3 as mouting point
ENV MNT_POINT=/var/s3

# Install required utilities (basically s3fs is needed here). Mailcap install MIME types to avoid some warnings
RUN apt-get update && apt-get install -y s3fs mailcap

# Create the mount point in the container Filesystem
RUN mkdir -p "$MNT_POINT"

# Some optional housekeeping to get rid of unneeded packages
RUN apt-get autoremove && apt-get clean

# 1.- Create a file containing the credentials used to access the Minio Endpoint. 
#     They are defined as variables that must be passed to the container ideally using a secret
# 2.- Mount the S3 bucket in /var/s3. To be able to mount the filesystem s3fs needs
#      - credentials injected as env variables through a Secret and written in s3-passwd file
#      - S3 MinIO Endpoint (passed as env variable through a Secret)
#      - S3 Bucket Name (passed as env variable through a Secret)
#      - other specific keywords for MinIO such use_path_request_style and insecure SSL
# 3.- Last tail command to run container indefinitely and to avoid completion

CMD echo "$ACCESS_KEY:$SECRET_KEY" > /etc/s3-passwd && chmod 600 /etc/s3-passwd && \
    /usr/bin/s3fs "$MINIO_S3_BUCKET" "$MNT_POINT" -o passwd_file=/etc/s3-passwd \
    -o url="$MINIO_S3_ENDPOINT" \
    -o use_path_request_style -o no_check_certificate -o ssl_verify_hostname=0 && \
    tail -f /dev/null

Once the Dockerfile is defined we can build and push the image to your registry. Feel free to use mine here if you wish. If you are using dockerhub instead of a private registry you would need to have an account and login before proceeding.

docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: jhasensio
Password: <yourpasswordhere>
WARNING! Your password will be stored unencrypted in /home/jhasensio/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

Now build your image using Dockerfile settings.

docker build –rm -t jhasensio/s3-mounter .
Sending build context to Docker daemon  18.94kB
Step 1/6 : FROM ubuntu:22.04
22.04: Pulling from library/ubuntu
677076032cca: Pull complete 
Digest: sha256:9a0bdde4188b896a372804be2384015e90e3f84906b750c1a53539b585fbbe7f
Status: Downloaded newer image for ubuntu:22.04
 ---> 58db3edaf2be
Step 2/6 : ENV MNT_POINT=/var/s3
 ---> Running in d8f0c218519d
Removing intermediate container d8f0c218519d
 ---> 6a4998c5b028
Step 3/6 : RUN apt-get update && apt-get install -y s3fs mailcap
 ---> Running in a20f2fc03315
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
... <skipped>

Step 4/6 : RUN mkdir -p "$MNT_POINT"
 ---> Running in 78c5f3328988
Removing intermediate container 78c5f3328988
 ---> f9cead3b402f
Step 5/6 : RUN apt-get autoremove && apt-get clean
 ---> Running in fbd556a42ea2
Reading package lists...
Building dependency tree...
Reading state information...
0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.
Removing intermediate container fbd556a42ea2
 ---> 6ad6f752fecc
Step 6/6 : CMD echo "$ACCESS_KEY:$SECRET_KEY" > /etc/s3-passwd && chmod 600 /etc/s3-passwd &&     /usr/bin/s3fs $MINIO_S3_BUCKET $MNT_POINT -o passwd_file=/etc/s3-passwd -o use_path_request_style -o no_check_certificate -o ssl_verify_hostname=0 -o url=$MINIO_S3_ENDPOINT  &&     tail -f /dev/null
 ---> Running in 2c64ed1a3c5e
Removing intermediate container 2c64ed1a3c5e
 ---> 9cd8319a789d
Successfully built 9cd8319a789d
Successfully tagged jhasensio/s3-mounter:latest

Now that you have your image built it’s time to push it to the dockerhub registry using following command

docker push jhasensio/s3-mounter
Using default tag: latest
The push refers to repository [docker.io/jhasensio/s3-mounter]
2c82254eb995: Pushed 
9d49035aff15: Pushed 
3ed41791b66a: Pushed 
c5ff2d88f679: Layer already exists 
latest: digest: sha256:ddfb2351763f77114bed3fd622a1357c8f3aa75e35cc66047e54c9ca4949f197 size: 1155

Now you can use your custom image from your pods. Remember this image has been created with a very specific purpose of mounting an S3 bucket in his filesystem, the next step is to inject the configuration and credentials needed into the running pod in order to sucess in the mouting process. There are different ways to do that but the recommended method is passing variables through a secret kubernetes object. Lets create a file with the required environment variables. The variables shown here were captured at tenant creation time and depends on your specific setup and tenant and bucket selected names.

vi s3bucket.env
MINIO_S3_ENDPOINT=https://minio.archive.svc.cluster.local:443
MINIO_S3_BUCKET=my-bucket
ACCESS_KEY=Q78mz0vRlnk6melG
SECRET_KEY=6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp

Now create the secret object using the previous environment file as source as shown below.

kubectl create secret generic s3-credentials –from-env-file=s3bucket.env
secret/s3-credentials created

Verify the content. Note the variables appears as Base64 coded once the secret is created.

kubectl get secrets s3-credentials -o yaml
apiVersion: v1
data:
  ACCESS_KEY: UTc4bXowdlJsbms2bWVsRw==
  MINIO_S3_BUCKET: bXktYnVja2V0
  MINIO_S3_ENDPOINT: aHR0cHM6Ly9taW5pby5hcmNoaXZlLnN2Yy5jbHVzdGVyLmxvY2FsOjQ0Mw==
  SECRET_KEY: Nm5XdEwyOE4xZXlWZFpkdDE2Q0lXeWl2VWgzUEI1RnA=
kind: Secret
metadata:
  creationTimestamp: "2023-02-10T09:33:53Z"
  name: s3-credentials
  namespace: default
  resourceVersion: "32343427"
  uid: d0628fff-0438-41e8-a3a4-d01ee20f82d0
type: Opaque

If you need to be sure about your coded variables just revert the process to double check if the injected data is what you expect. For example taking MINIO_S3_ENDPOINT as an example use following command to get the plain text of any given secret data entry.

kubectl get secrets s3-credentials -o jsonpath=”{.data.MINIO_S3_ENDPOINT}” | base64 -d ; echo
https://minio.archive.svc.cluster.local:443

Now that the configuration secret is ready we can proceed with the pod creation. Note you would need privilege access to be able to access the fuse kernel module that is needed to access the S3 bucket

vi s3mounterpod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: s3-mounter
spec:
  containers:
    - image: jhasensio/s3-mounter
      name: s3-mounter
      imagePullPolicy: Always
      # Privilege mode needed to access FUSE kernel module
      securityContext:
        privileged: true
      # Injecting credentials from secret (must be in same ns)
      envFrom:
      - secretRef:
          name: s3-credentials
      # Mounting host devfuse is required
      volumeMounts:
      - name: devfuse
        mountPath: /dev/fuse
  volumes:
    - name: devfuse
      hostPath:
        path: /dev/fuse

Open an interactive session to the pod for further exploration.

kubectl exec s3-mounter -ti — bash
root@s3-mounter:/# 

Verify the secret injected variables are shown as expected

root@s3-mounter:/# env | grep -E “MINIO|KEY”
ACCESS_KEY=Q78mz0vRlnk6melG
SECRET_KEY=6nWtL28N1eyVdZdt16CIWyivUh3PB5Fp
MINIO_S3_ENDPOINT=https://minio.archive.svc.cluster.local:443
MINIO_S3_BUCKET=my-bucket

Also verify the mount in the pod filesystem

root@s3-mounter:/# mount | grep s3
s3fs on /var/s3 type fuse.s3fs (rw,nosuid,nodev,relatime,user_id=0,group_id=0)

You should be able to read any existing data in the s3 bucket. Note how you can list 10Gb.file we created earlier using the mc client.

root@s3-mounter:~# ls -alg /var/s3
total 10485765
drwx------ 1 root           0 Jan  1  1970 .
drwxr-xr-x 1 root        4096 Jan 04 17:00 ..
-rw-r----- 1 root 10737418240 Jan 04 17:53 10Gb.file

Create a new file to verify if you can write as well.

root@s3-mounter:~# touch /var/s3/file_from_s3_mounter_pod.txt

The new file should appear in the bucket

root@s3-mounter:/var/s3# ls -alg
total 10485766
drwx------ 1 root           0 Jan  1  1970 .
drwxr-xr-x 1 root        4096 Jan 04 17:00 ..
-rw-r----- 1 root 10737418240 Jan 04 17:53 10Gb.file
-rw-r--r-- 1 root           0 Jan 04 17:59 file_from_s3_mounter_pod.txt

And it should be accesible using the mc client as seen below.

mc ls minio-archive/my-bucket –insecure
[2023-01-04 17:53:34 CET]  10GiB STANDARD 10Gb.file
[2023-01-04 17:59:22 CET]     0B STANDARD file_from_s3_mounter_pod.txt

These are great news but, is that approach really useful? That means that to be able to reach an S3 bucket we need to prepare a custom image with our intended application and add the s3fs to ensure we are able to mount a MinIO S3 filesystem. That sounds pretty rigid so lets explore other more flexible options to achieve similar results.

Option 2. Using a sidecar pattern

Once we have prepared the custom image and have verified everything is working as expected we can give another twist. As you probably know there is a common pattern available in kubernetes that involves running an additional container (a.k.a sidecar) alongside the main container in a pod. The sidecar container provides additional functionality, such as logging, monitoring, or networking, to the main container. In our case the additional container will be in charge of mouting the MinIO S3 filesystem allowing the main container to focus on its core responsibility and offload this storage related tasks to the sidecar. The following picture depicts the intended arquictecture.

Pod using Sidecar Pattern with S3-Mounter

Lets try to create a simple pod to run apache to provide access to an S3 backed filesystem via web. The first step is to create a custom httpd.conf file to serve html document in a custom path at “/var/www/html”. Create a file like shown below:

vi httpd.conf
ServerRoot "/usr/local/apache2"
ServerName "my-apache"
Listen 80
DocumentRoot "/var/www/html"
LoadModule mpm_event_module modules/mod_mpm_event.so
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule unixd_module modules/mod_unixd.so
LoadModule autoindex_module modules/mod_autoindex.so

<Directory "/var/www/html">
  Options Indexes FollowSymLinks
  AllowOverride None
  Require all granted
</Directory>

Now create a configmap object from the created file to easily inject the custom configuration into the pod.

kubectl create cm apache-config –from-file=httpd.conf
configmap/apache-config created

Find below a sample sidecar pod. The mountPropagation spec does the trick here. If you are interested, there is a deep dive around Mount Propagation in this blog. The bidirectional mountPropagation Bidirectional allow any volume mounts created by the container to be propagated back to the host and to all containers of all pods that use the same volume that is exactly what we are trying to achieve here. The s3-mounter sidecar container will mount the S3 bucket and will propagate to the host as a local mount that in turn will be available to the main apache pod as a regular hostPath type persistent volume.

vi web-server-sidecar-s3.yaml
apiVersion: v1
kind: Pod
metadata:
  name: web-server-sidecar-s3
spec:
  containers:
      ######
      # Main apache container
      ####
    - name: apache2
      image: httpd:2.4.55
      securityContext:
        privileged: true
      ports:
        - containerPort: 80
          name: http-web
      volumeMounts:
      # Mount custom apache config file from volume
        - name: apache-config-volume
          mountPath: /tmp/conf
      # Mount S3 bucket into web server root. (bidirectional
        - name: my-bucket
          mountPath: /var/www
          mountPropagation: Bidirectional
      # Copy custom httpd.conf extracted from configmap in right path and run httpd
      command: ["/bin/sh"]
      args: ["-c", "cp /tmp/conf/httpd.conf /usr/local/apache2/conf/httpd.conf && /usr/local/bin/httpd-foreground"]
      ######
      # Sidecar Container to mount MinIO S3 Bucket
      ####
    - name: s3mounter
      image: jhasensio/s3-mounter
      imagePullPolicy: Always
      securityContext:
        privileged: true
      envFrom:
        - secretRef:
            name: s3-credentials
      volumeMounts:
        - name: devfuse
          mountPath: /dev/fuse
      # Mount S3 bucket  (bidirectional allow sharing between containers)
        - name: my-bucket
          mountPath: /var/s3
          mountPropagation: Bidirectional
     # Safely umount filesystem before stopping
      lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","fusermount -u /var/s3"]
  volumes:
    - name: devfuse
      hostPath:
        path: /dev/fuse
    # Host filesystem
    - name: my-bucket
      hostPath:
        path: /mnt/my-bucket
    # Safely umount filesystem before stopping
    - name: apache-config-volume
      configMap:
        name: apache-config

Once the pod is ready try to list the contents of the /var/s3 folder at s3mounter sidecar container using follwing command (note we need to specify the name of the container with -c keyword to reach the intented one). The contents of the folder should be listed.

kubectl exec web-server-sidecar-s3 -c s3mounter — ls /var/s3/
10Gb.file
file_from_s3_mounter_pod.txt
html

Repeat the same for the apache2 container. The contents of the folder should be listed as well.

kubectl exec web-server-sidecar-s3 -c apache2 — ls /var/www/
10Gb.file
file_from_s3_mounter_pod.txt
html

Also in the worker filesystem the mounted S3 should be available. Extract the IP of the worker in which the pod is running using the .status.hostIP and try to list the contents of the local hostpath /mnt/my-bucket. You can jump into the worker IP or using a single command via ssh remote execution as seen below:

ssh $(kubectl get pod web-server-sidecar-s3 -o jsonpath=”{.status.hostIP}”) sudo ls /mnt/my-bucket
10Gb.file
file_from_s3_mounter_pod.txt
html

Now create some dummy files in the worker filesystem and in the html folder. This html folder is the configured root directory in apache2 to serve html documents.

ssh $(kubectl get pod web-server-sidecar-s3 -o jsonpath="{.status.hostIP}") sudo touch /mnt/my-bucket/html/file{01..10}.txt

And now both pods should be able to list the new files as shown below from apache2.

kubectl exec web-server-sidecar-s3 -c apache2 — ls /var/www/html
file01.txt
file02.txt
file03.txt
file04.txt
file05.txt
file06.txt
file07.txt
file08.txt
file09.txt
file10.txt

Apache2 will serve the content of the html folder so lets try to verify we can actually access the files through http. First forward the apache2 pod port listening at port 80 to a local port 8080 via kubectl port-forward command.

kubectl port-forward web-server-sidecar-s3 8080:80 &
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80

And now try to reach the apache2 web server. According to the httpd.conf it should display any file in the folder. Using curl you can verify this is working as displayed below

curl localhost:8080
Handling connection for 8080
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /</title>
 </head>
 <body>
<h1>Index of /</h1>
<ul><li><a href="file01.txt"> file01.txt</a></li>
<li><a href="file02.txt"> file02.txt</a></li>
<li><a href="file03.txt"> file03.txt</a></li>
<li><a href="file04.txt"> file04.txt</a></li>
<li><a href="file05.txt"> file05.txt</a></li>
<li><a href="file06.txt"> file06.txt</a></li>
<li><a href="file07.txt"> file07.txt</a></li>
<li><a href="file08.txt"> file08.txt</a></li>
<li><a href="file09.txt"> file09.txt</a></li>
<li><a href="file10.txt"> file10.txt</a></li>
</ul>
</body></html>

If you want to try from a browser you should also access the S3 bucket.

Now we have managed to make the s3-mounter pod independent of a regular pod using a sidecar pattern to deploy our application. Regardless this configuration may fit with some requirements it may not scale very well in other environments. For example, if you create another sidecar pod in the same worker node trying to access the very same bucket you will end up with an error generated by the new s3-mounter when trying to mount bidirecctionally a non-empty volume (basically because it is already mounted by the other pod) as seen below.

kubectl logs web-server-sidecar-s3-2 -c s3mounter
s3fs: MOUNTPOINT directory /var/s3 is not empty. if you are sure this is safe, can use the 'nonempty' mount option.

So basically you must use a different hostPath such /mnt/my-bucket2 to avoid above error which implies an awareness and management of existing hostPaths which sounds weird and not very scalable. This is where the third approach come into play.

Option 3. Using a DaemonSet to create a shared volume

This option relies on a DaemonSet object type. According to official documentation A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.

This way we can think in the bucket as a s3-backed storage provider with a corresponding persistent volume mounted as HostPath in all the workers. This approach is ideal to avoid any error when multiple pods try to mount the same bucket using the same host filesystem path. The following picture depicts the architecture with a 3 member cluster.

S3 Backend access using shared hostpath volumes with DaemonSet mounter

The DaemonSet object will look like the one shown below. We have added some lifecycle stuff to ensure the S3 filesystem is properly unmounted whe the pod is stopped.

vi s3-mounter-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: minio-s3-my-bucket
  name: minio-s3-my-bucket
spec:
  selector:
    matchLabels:
       app: minio-s3-my-bucket
  template:
    metadata:
      labels:
        app: minio-s3-my-bucket
    spec:
      containers:
      - name: s3fuse
        image: jhasensio/s3-mounter
        imagePullPolicy: Always
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","fusermount -u /var/s3"]
        securityContext:
          privileged: true
        envFrom:
        - secretRef:
            name: s3-credentials
        volumeMounts:
        - name: devfuse
          mountPath: /dev/fuse
        - name: my-bucket
          mountPath: /var/s3
          mountPropagation: Bidirectional
      volumes:
      - name: devfuse
        hostPath:
          path: /dev/fuse
      - name: my-bucket
        hostPath:
          path: /mnt/my-bucket

Apply the above manifest and verify the daemonset is properly deployed and is ready and running in all the workers in the cluster.

kubectl get daemonsets.apps
NAME                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
minio-s3-my-bucket   6         6         6       6            6           <none>          153m

Verify you can list the contents of the S3 bucket

kubectl exec minio-s3-my-bucket-fzcb2 — ls /var/s3
10Gb.file
file_from_s3_mounter_pod.txt
html

Now that the daemonSet is properly running we should be able to consume the /mnt/my-bucket path in the worker filesystem as a regular hostPath volume. Let’s create the same pod we used previously as an single container pod. Remember to use Bidirectional mountPropagation again.

vi apache-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: apachepod
spec:
  containers:
    - name: web-server
      image: httpd:2.4.55
      securityContext:
        privileged: true
      ports:
        - containerPort: 80
          name: http-web
      volumeMounts:
        - name: apache-config-volume
          mountPath: /tmp/conf
        - name: my-bucket
          mountPath: /var/www
          mountPropagation: Bidirectional
      command: ["/bin/sh"]
      args: ["-c", "cp /tmp/conf/httpd.conf /usr/local/apache2/conf/httpd.conf && /usr/local/bin/httpd-foreground"]
  volumes:
    - name: apache-config-volume
      configMap:
        name: apache-config
    - name: my-bucket
      hostPath: 
        path: /mnt/my-bucket

Try to list the contents of the volume pointing to the hostPath /mnt/my-bucket that in turn points to the /var/s3 folder used by the daemonset controlled pod to mount the s3 bucket.

kubectl exec apachepod — ls /var/www
10Gb.file
file_from_s3_mounter_pod.txt
html

Repeat the port-forward to try to reach the apache2 web server.

kubectl port-forward apachepod 8080:80 &
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80

And you should see again the contents of /var/www/html folder via http as shown below.

curl localhost:8080
Handling connection for 8080
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /</title>
 </head>
 <body>
<h1>Index of /</h1>
<ul><li><a href="file01.txt"> file01.txt</a></li>
<li><a href="file02.txt"> file02.txt</a></li>
<li><a href="file03.txt"> file03.txt</a></li>
<li><a href="file04.txt"> file04.txt</a></li>
<li><a href="file05.txt"> file05.txt</a></li>
<li><a href="file06.txt"> file06.txt</a></li>
<li><a href="file07.txt"> file07.txt</a></li>
<li><a href="file08.txt"> file08.txt</a></li>
<li><a href="file09.txt"> file09.txt</a></li>
<li><a href="file10.txt"> file10.txt</a></li>
</ul>
</body></html>

It works!

Bonus: All together now. It’s time for automation

So far we have explored different methods to install and consume an S3 backend over a vSAN enabled infrastructure. Some of this methods are intented to be done by a human, in fact, access to the console is very interesting for learning and for performing maintenance and observability tasks on day 2 where a graphical interface might be crucial.

However, in the real world and especially when using kubernetes, it would be very desirable to be able to industrialize all these provisioning tasks without any mouse “click”. Next you will see a practical case for provisioning via CLI and therefore 100% automatable.

To recap, the steps we need to complete the provisioning of a S3 bucket from scratch given a kubernetes cluster with MinIO operator installed are summarized below

  • Create a namespace to place all MinIO tenant related objects
  • Create a MinIO tenant using MinIO plugin (or apply an existing yaml manifiest containing the Tenant CRD and rest of required object)
  • Extract credentials for the new tenant and inject them into a kubernetes secret object
  • Create the MinIO S3 bucket via a kubernetes job
  • Deploy Daemonset to expose the S3 backend as a shared volume

The following script can be used to automate everything. It will only require two parameters namely Tenant and Bucket name. In the script they are statically set but it would be very easy to pass both parameters as an input.

Sample Script to Create MinIO Tenant and share S3 bucket through a Bidirectional PV
#/bin/bash

# INPUT PARAMETERS
# ----------------
TENANT=test
BUCKET=mybucket

# STEP 1 CREATE NAMESPACE AND TENANT
# --------------------------------
kubectl create ns $TENANT
kubectl minio tenant create $TENANT --servers 2 --volumes 4 --capacity 10G --namespace $TENANT --storage-class vsphere-sc-sna

# STEP 2 DEFINE VARIABLES, EXTRACT CREDENTIALS FROM CURRENT TENANT AND PUT TOGETHER IN A SECRET 
# ----------------------------------------------------------------------------------------------
echo "MINIO_S3_ENDPOINT=https://minio.${TENANT}.svc.cluster.local" > s3bucket.env
echo "MINIO_S3_BUCKET=${BUCKET}" >> s3bucket.env
echo "SECRET_KEY=$(kubectl get secrets -n ${TENANT} ${TENANT}-user-1 -o jsonpath="{.data.CONSOLE_SECRET_KEY}" | base64 -d)" >> s3bucket.env
echo "ACCESS_KEY=$(kubectl get secrets -n ${TENANT} ${TENANT}-user-1 -o jsonpath="{.data.CONSOLE_ACCESS_KEY}" | base64 -d)" >> s3bucket.env

kubectl create secret generic s3-credentials --from-env-file=s3bucket.env

# STEP 3 CREATE BUCKET USING A JOB
----------------------------------
kubectl apply -f create-s3-bucket-job.yaml


# STEP 4 WAIT FOR S3 BUCKET CREATION JOB TO SUCCESS
---------------------------------------------------
kubectl wait pods --for=jsonpath='{.status.phase}'=Succeeded -l job-name=create-s3-bucket-job --timeout=10m


# STEP 5 DEPLOY DAEMONSET TO SHARE S3 BUCKET AS A PV
----------------------------------------------------
kubectl apply -f s3-mounter-daemonset.yaml

Note the Step 2 requires the MinIO operator deployed in the cluster and also the krew MinIO plugin installed. Alternatively, you can also use the –output to dry-run the tenant creation command and generate an output manifest yaml file that can be exported an used later without the krew plugin installed. The way to generate the tenant file is shown below.

kubectl minio tenant create $TENANT --servers 2 --volumes 4 --capacity 10G --namespace $TENANT --storage-class vsphere-sc --output > tenant.yaml

Is also worth to mention we have deciced to create the MinIO bucket through a job. This is a good choice to perform task in a kubernetes cluster. As you can tell looking into the manifest content below, the command used in the pod in charge of the job includes some simple logic to ensure the task is retried in case of error whilst the tenant is still spinning up and will run until the bucket creation is completed succesfully. The manifest that define the job is shown below.

vi create-s3-bucket-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: create-s3-bucket-job
spec:
  template:
    spec:
      containers:
      - name: mc
       # s3-credentials secret must contain $ACCESS_KEY, $SECRET_KEY, $MINIO_S3_ENDPOINT, $MINIO_S3_BUCKET 
        envFrom:
        - secretRef:
            name: s3-credentials
        image: minio/mc
        command: 
          - sh
          - -c
          - ls /tmp/error > /dev/null 2>&1 ; until [[ "$?" == "0" ]]; do sleep 5; echo "Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs"; mc alias set s3 $(MINIO_S3_ENDPOINT) $(ACCESS_KEY) $(SECRET_KEY) --insecure; done && mc mb s3/$(MINIO_S3_BUCKET) --insecure
      restartPolicy: Never
  backoffLimit: 4

Once the job is created you can observe up to three diferent errors of the MinIO client trying to complete the bucket creation. The logic will reattempt until completion as shown below:

kubectl logs -f create-s3-bucket-job-qtvx2
mc: <ERROR> Unable to initialize new alias from the provided credentials. Get "https://minio.my-minio-tenant.svc.cluster.local/probe-bucket-sign-ubatm9xin2a
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. Get "https://minio.my-minio-tenant.svc.cluster.local/probe-bucket-sign-n09ttv9pnfw
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. Server not initialized, please try again.                              
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. Server not initialized, please try again.                              
...skipped                                                                          
mc: <ERROR> Unable to initialize new alias from the provided credentials. The Access Key Id you provided does not exist in our records.          
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                                                                                                        
mc: <ERROR> Unable to initialize new alias from the provided credentials. The Access Key Id you provided does not exist in our records.          
... skipped         
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
mc: <ERROR> Unable to initialize new alias from the provided credentials. The Access Key Id you provided does not exist in our records.          
Attempt to connect with MinIO failed. Attempt to reconnect in 5 secs                                                                             
Added `s3` successfully.                                                                                                                         
Bucket created successfully `s3/mybucket`.

Once the job is completed, the wait condition (status.phase=Suceeded) will be met and the script will continue to next step that consists on the deployment of the DaemonSet. Once the DaemonSet is ready you should be able to consume the hostPath type PV that points to the S3 bucket (/mnt/my-bucket) in this case from any regular pod. Create a test file in the pod mount folder (/var/s3).

kubectl exec pods/minio-s3-my-bucket-nkgw2 -ti -- touch /var/s3/test.txt

Now you can spin a simple sleeper pod that is shown below for completeness. Don’t forget to add the Bidirectional mountPropagation spec.

vi sleeper_pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: s3-sleeper-test
spec:
  containers:
    - name: sleeper
      image: ubuntu
      securityContext:
        privileged: true
      volumeMounts:
        - name: my-bucket
          mountPath: /data
          mountPropagation: Bidirectional
      command: ["sleep", "infinity"]
  volumes:
    - name: my-bucket
      hostPath: 
        path: /mnt/my-bucket

We should be able to list de S3 bucket objects under /data pod folder.

kubectl exec pods/s3-sleeper-test -ti — ls -alg /data
total 5
drwx------ 1 root    0 Jan  1  1970 .
drwxr-xr-x 1 root 4096 Feb 13 10:16 ..
-rw-r--r-- 1 root    0 Feb 13 10:13 test.txt

We are done!!!

This has been a quite lenghty series of articles. We have learn how to install vSphere CSI Driver, how to enable vSAN File Services and, in this post, how to setup MinIO over the top of a vSAN enabled infrastructure and how to consume from kubernetes pods using different approaches. Hope you can find useful.

Preparing vSphere for Kubernetes Persistent Storage (2/3): Enabling vSAN File Services

In the previous article we walked through the preparation of an upstream k8s cluster to take advantage of the converged storage infrastructure that vSphere provides by using a CSI driver that allows the pod to consume the vSAN storage in the form of Persistent Volumes created automatically through an special purpose StorageClass.

With the CSI driver we would have most of the persistent storage needs for our pods covered, however, in some particular cases it is necessary that multiple pods can mount and read/write the same volume simultaneously. This is basically defined by the Access Mode specification that is part of the PV/PVC definition. The typical Access Modes available in kubernetes are:

  • ReadWriteOnce – Mount a volume as read-write by a single node
  • ReadOnlyMany – Mount the volume as read-only by many nodes
  • ReadWriteMany – Mount the volume as read-write by many nodes

In this article we will focus on the Access Mode ReadWriteMany (RWX) that allow a volume to be mounted simultaneously in read-write mode for multiple pods running in different kubernetes nodes. This access mode is tipically supported by a network file sharing technology such as NFS. The good news are this is not a big deal if we have vSAN because, again, we can take advantage of this wonderful technology to enable the built-in file services and create shared network shares in a very easy and integrated way.

Enabling vSAN File Services

The procedure for doing this is described below. Let’s move into vSphere GUI for a while. Access to your cluster and go to vSAN>Services. Now click on ENABLE blue button at the bottom of the File Service Tile.

Enabling vSAN file Services

The first step will be to select the network on which the service will be deployed. In my case it will select a specific PortGroup in the subnet 10.113.4.0/24 and the VLAN 1304 with name VLAN-1304-NFS.

Enable File Services. Select Network

This action will trigger the creation of the necessary agents in each one of the hosts that will be in charge of serving the shared network resources via NFS. After a while we should be able to see a new Resource Group named ESX Agents with four new vSAN File Service Node VMs.

vSAN File Service Agents

Once the agents have been deployed we can access to the services configuration and we will see that the configuration is incomplete because we haven’t defined some important service parameters yet. Click on Configure Domain button at the botton of the File Service tile.

vSAN File Service Configuration Tile

The first step is to define a domain that will host the names of the shared services. In my particular case I will use vsanfs.sdefinitive.net as domain name. Any shared resource will be reached using this domain.

vSAN File Service Domain Configuration

Before continuing, it is important that our DNS is configured with the names that we will give to the four file services agents needed. In my case I am using fs01, fs02, fs03 and fs04 as names in the domain vsanfs.sdefinitive.net and the IP addresses 10.113.4.100-103. Additionally we will have to indicate the IP of the DNS server used in the resolution, the netmask and the default gateway as shown below.

vSAN File Services Network Configuration

In the next screen we will see the option to integrate with AD, at the moment we can skip it because we will consume the network services from pods.

vSAN File Service integration with Active Directory

Next you will see the summary of all the settings made for a last review before proceeding.

vSAN File Service Domain configuration summary

Once we press the FINISH green button the network file services will be completed and ready to use.

Creating File Shares from vSphere

Once the vSAN File Services have been configured we should be able to create network shares that will be eventually consumed in the form of NFS type volumes from our applications. To do this we must first provision the file shares according our preferences. Go to File Services and click on ADD to create a new file share.

Adding a new NFS File Share

The file share creation wizard allow us to specify some important parameters such as the name of our shared service, the protocol (NFS) used to export the file share, the NFS version (4.1 and 3), the Storage Policy that the volume will use and, finally other quota related settings such as the size and warning threshold for our file share.

Settings for vSAN file share

Additionally we can set a add security by means of a network access control policy. In our case we will allow any IP to access the shared service so we select the option “Allow access from any IP” but feel free to restrict access to certain IP ranges in case you need it.

Network Access Control for vSAN file share

Once all the parameters have been set we can complete the task by pressing the green FINISH button at the bottom right side of the window.

vSAN File Share Configuration Summary

Let’s inspect the created file share that will be seen as another vSAN Virtual Object from the vSphere administrator perspective.

Inpect vSAN File Share

If we click on the VIEW FILE SHARE we could see the specific configuration of our file share. Write down the export path (fs01.vsanfs.sdefinitive.net:/vsanfs/my-nfs-share) since it will be used later as an specification of the yaml manifest that will declare the corresponding persistent volume kubernetes object.

Displaying vSAN File Share parameters

From an storage administrator perspective we are done. Now we will see how to consume it from the developer perspective through native kubernetes resources using yaml manifest.

Consuming vSAN Network File Shares from Pods

A important requirement to be able to mount nfs shares is to have the necesary software installed in the worker OS, otherwise the mounting process will fail. If you are using a Debian’s familly Linux distro such as Ubuntu, the installation package that contains the necessary binaries to allow nfs mounts is nfs-common. Ensure this package is installed before proceeding. Issue below command to meet the requirement.

sudo apt-get install nfs-common

Before proceeding with creation of PV/PVCs, it is recommended to test connectivity from the workers as a sanity check. The first basic test would be pinging to the fqdn of the host in charge of the file share as defined in the export path of our file share captured earlier. Ensure you can also ping to the rest of the nfs agents defined (fs01-fs04).

ping fs01.vsanfs.sdefinitive.net
PING fs01.vsanfs.sdefinitive.net (10.113.4.100) 56(84) bytes of data.
64 bytes from 10.113.4.100 (10.113.4.100): icmp_seq=1 ttl=63 time=0.688 ms

If DNS resolution and connectivity is working as expected we are safe to mount the file share in any folder in your filesystem. Following commands show how to mount the file share using NFS 4.1 by using the export part associated to our file share. Ensure the mount point (/mnt/my-nfs-share in this example) exists before proceeding. If not so create in advance using mkdir as usual.

mount -t nfs4 -o minorversion=1,sec=sys fs01.vsanfs.sdefinitive.net:/vsanfs/my-nfs-share /mnt/my-nfs-share -v
mount.nfs4: timeout set for Fri Dec 23 21:30:09 2022
mount.nfs4: trying text-based options 'minorversion=1,sec=sys,vers=4,addr=10.113.4.100,clientaddr=10.113.2.15'

If the mounting is sucessfull you should be able to access the share at the mount point folder and even create a file like shown below.

/ # cd /mnt/my-nfs-share
/ # touch hola.txt
/ # ls

hola.txt

Now we are safe to jump into the manifest world to define our persistent volumes and attach them to the desired pod. First declare the PV object using ReadWriteMany as accessMode and specify the server and export path of our network file share.

Note we will use here a storageClassName specification using an arbitrary name vsan-nfs. Using a “fake” or undefined storageClass is supported by kubernetes and is tipically used for binding purposes between the PV and the PVC which is exactly our case. This is a requirement to avoid that our PV resource ends up using the default storage class which in this particular scenario would not be compatible with the ReadWriteMany access-mode required for NFS volumes.

vi nfs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  storageClassName: vsan-nfs
  capacity:
    storage: 500Mi
  accessModes:
    - ReadWriteMany
  nfs:
    server: fs01.vsanfs.sdefinitive.net
    path: "/vsanfs/my-nfs-share"

Apply the yaml and verify the creation of the PV. Note we are using rwx mode that allow access to the same volume from different pods running in different nodes simultaneously.

kubectl get pv nfs-pv
NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
nfs-pv   500Mi      RWX            Retain           Bound    default/nfs-pvc   vsan-nfs                 60s

Now do the same for the PVC pointing to the PV created. Note we are specifiying the same storageClassName to bind the PVC with the PV. The accessMode must be also consistent with PV definition and finally, for this example we are claiming 500 Mbytes of storage.

vi nfs-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfs-pvc
spec:
  storageClassName: vsan-nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 500Mi

As usual verify the status of the pvc resource. As you can see the pvc is bound state as expected.

kubectl get pvc nfs-pvc
NAME      STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfs-pvc   Bound    nfs-pv   500Mi      RWX            vsan-fs        119s

Then attach the volume to a regular pod using following yaml manifest as shown below. This will create a basic pod that will run an alpine image that will mount the nfs pvc in the /my-nfs-share container’s local path . Ensure the highlighted claimName specification of the volume matches with the PVC name defined earlier.

vi nfs-pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nfs-pod1
spec:
  containers:
  - name: alpine
    image: "alpine"
    volumeMounts:
    - name: nfs-vol
      mountPath: "/my-nfs-share"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: nfs-vol
      persistentVolumeClaim:
        claimName: nfs-pvc

Apply the yaml using kubectl apply and try to open a shell session to the container using kubectl exec as shown below.

kubectl exec nfs-pod1 -it -- sh

We should be able to access the network share, list any existing files to check if you are able to write new files as shown below.

/ # touch /my-nfs-share/hola.pod1
/ # ls /my-nfs-share
hola.pod1  hola.txt

The last test to check if actually multiple pods running in different nodes can read and write the same volume simultaneously would be creating a new pod2 that mounts the same volume. Ensure that both pods are scheduled in different nodes for a full verification of the RWX access-mode.

vi nfs-pod2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nfs-pod2
spec:
  containers:
  - name: alpine
    image: "alpine"
    volumeMounts:
    - name: nfs-vol
      mountPath: "/my-nfs-share"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: nfs-vol
      persistentVolumeClaim:
        claimName: nfs-pvc

In the same manner apply the manifest file abouve to spin up the new pod2 and try to open a shell.

kubectl exec nfs-pod2 -it -- sh

Again, we should be able to access the network share, list existing files and also to create new files.

/ # touch /my-nfs-share/hola.pod2
/ # ls /my-nfs-share
hola.pod1  hola.pod2  hola.txt

In this article we have learnt how to enable vSAN File Services and how to consume PV in RWX. In the next post I will explain how to leverage MinIO technology to provide an S3 like object based storage on the top of vSphere for our workloads. Stay tuned!

© 2024 SDefinITive

Theme by Anders NorenUp ↑