Antrea is an opensource Kubernetes Networking and Security project maintaned by VMware that implements a Container Network Interface (CNI) to provide network connectivity and security for pod workloads. It has been designed with flexibility and high performance in mind and is based on Open vSwitch (OVS), a very mature project that stands out precisely because of these characteristics.

Antrea creates a SDN-like architecture separating the data plane (which is based in OVS) and the control plane. To do so, Antrea installs an agent component in each of the worker nodes to program the OVS datapath whereas a central controller running on the control plane node is be in charge of centralized tasks such as calculating network policies. The following picture that is available at main Antrea site depicts how it integrates with a kubernetes cluster.

Antrea high-level architecture

This series of post will focus in how to provide observability capabilities to this CNI using different tools. This first article will explain how to install Antrea and some related management stuff and also how to create a testbed environment based on a microservices application and a load-generator tool that will be used as reference example throughout the next articles. Now we have a general overview of what Antrea is, let’s start by installing Antrea through Helm.

Installing Antrea using Helm

To start with Antrea, the first step is to create a kubernetes cluster. The process of setting up a kubernetes cluster is out of scope of this post. By the way, this series of articles are based on kubernetes vanilla but feel free to try with another distribution such as VMware Tanzu or Openshift. If you are using kubeadm to build your kubernetes cluster you need to start using –pod-network-cidr=<CIDR> parameter to enable the NodeIpamController in kubernetes, alternatively you can leverage some built-in Antrea IPAM capabilities.

The easiest way to verify that NodeIPAMController is operating in your existing cluster is by checking if the flag –-allocate-node-cidrs is set to true and if there is a cluster-cidr configured. Use the following command for verification.

kubectl cluster-info dump | grep cidr
                            "--allocate-node-cidrs=true",
                            "--cluster-cidr=10.34.0.0/16",

Is important to mention that in kubernetes versions prior to 1.24, the CNI plugin could be managed by kubelet and there was a requirement to ensure the kubelet was started with network-plugin=cni flag enabled, however in newer versions is the container runtime instead of the kubelet who is in charge of managing the CNI plugin. In my case I am using containerd 1.6.9 as container runtime and Kubernetes version 1.24.8.

To install the Antrea CNI you just need to apply the kubernetes manifest that specifies all the resources needed. The latest manifest is generally available at the official antrea github site here.

Alternatively, VMware also maintains a Helm chart available for this purpose. I will install Antrea using Helm here because it has some advantages. For example, in general when you enable a new featuregate you must manually restart the pods to actually applying new settings but, when using Helm, the new changes are applied and pods restarted as part of the new release deployment. There is however an important consideration on updating CRDs when using Helm. Make sure to check this note before updating Antrea.

Lets start with the installation process. Considering you have Helm already installed in the OS of the server you are using to manage your kubernetes cluster (otherwise complete installation as per official doc here), the first step would be to add the Antrea repository source as shown below.

helm repo add antrea https://charts.antrea.io
"antrea" has been added to your repositories

As a best practice update the repo before installing to ensure you work with the last available version of the chart.

helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "antrea" chart repository

You can also explore the contents of the added repository. Note there is not only a chart for Antrea itself but also other interesting charts that will be explore in next posts. At the time of writing this post the last Antrea version is 1.10.0.

helm search repo antrea
NAME                    CHART VERSION   APP VERSION     DESCRIPTION                                
antrea/antrea           1.10.0          1.10.0          Kubernetes networking based on Open vSwitch
antrea/flow-aggregator  1.10.0          1.10.0          Antrea Flow Aggregator                     
antrea/theia            0.4.0           0.4.0           Antrea Network Flow Visibility    

When using Helm, you can customize the installation of your chart by means of templates in the form of yaml files that contain all configurable settings. To figure out what settings are available for a particular chart a good idea is to write in a default_values.yaml file that would contain all the accepted values using the helm show values command as you can see here.

helm show values antrea/antrea >> default_values.yaml

Using the default_values.yaml file as a reference, you can change any of the default values to meet your requirements. Note that, as expected for a default configuration file, any configuration not explicitily referenced in the values file will use default settings. We will use a very simplified version of the values file with just a simple setting to specifiy the desired tag or version for our deployment. We will add extra configuration later when enabling some features. Create a simple values.yaml file with this content.

vi values.yaml
# -- Container image to use for Antrea components.
image:
  tag: "v1.10.0"

And now install the new chart using helm install and the created values.yaml file as input.

helm install antrea -n kube-system antrea/antrea -f values.yaml
NAME: antrea
LAST DEPLOYED: Fri Jan 13 18:52:48 2023
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Antrea CNI has been successfully installed

You are using version 1.10.0

For the Antrea documentation, please visit https://antrea.io

After couple of minutes you should be able to see the pods in Running state. The Antrea agents are deployed using a daemonSet and thus they run an independent pod on every single node in the cluster. On the other hand the Antrea controller component is installed as a single replica deployment and may run by default on any node.

kubectl get pods -n kube-system -o wide
NAME                                          READY   STATUS    RESTARTS        AGE   IP            NODE                  NOMINATED NODE   READINESS GATES
antrea-agent-52cjk                            2/2     Running   0               86s   10.113.2.17   k8s-worker-03         <none>           <none>
antrea-agent-549ps                            2/2     Running   0               86s   10.113.2.10   k8s-contol-plane-01   <none>           <none>
antrea-agent-5kvhb                            2/2     Running   0               86s   10.113.2.15   k8s-worker-01         <none>           <none>
antrea-agent-6p856                            2/2     Running   0               86s   10.113.2.19   k8s-worker-05         <none>           <none>
antrea-agent-f75b7                            2/2     Running   0               86s   10.113.2.16   k8s-worker-02         <none>           <none>
antrea-agent-m6qtc                            2/2     Running   0               86s   10.113.2.18   k8s-worker-04         <none>           <none>
antrea-agent-zcnd7                            2/2     Running   0               86s   10.113.2.20   k8s-worker-06         <none>           <none>
antrea-controller-746dcd98d4-c6wcd            1/1     Running   0               86s   10.113.2.10   k8s-contol-plane-01   <none>           <none>

Your nodes should now transit to Ready status as a result of the CNI sucessfull installation.

kubectl get nodes
NAME                  STATUS   ROLES           AGE   VERSION
k8s-contol-plane-01   Ready    control-plane   46d   v1.24.8
k8s-worker-01         Ready    <none>          46d   v1.24.8
k8s-worker-02         Ready    <none>          46d   v1.24.8
k8s-worker-03         Ready    <none>          46d   v1.24.8
k8s-worker-04         Ready    <none>          46d   v1.24.8
k8s-worker-05         Ready    <none>          46d   v1.24.8
k8s-worker-06         Ready    <none>          46d   v1.24.8

Jump into any of the nodes using ssh. If Antrea CNI plugin is successfully installed you should see a new configuration file in the /etc/cni/net.d directory with a content similiar to this.

cat /etc/cni/net.d/10-antrea.conflist
{
    "cniVersion":"0.3.0",
    "name": "antrea",
    "plugins": [
        {
            "type": "antrea",
            "ipam": {
                "type": "host-local"
            }
        }
        ,
        {
            "type": "portmap",
            "capabilities": {"portMappings": true}
        }
        ,
        {
            "type": "bandwidth",
            "capabilities": {"bandwidth": true}
        }
    ]
}

As a last verification, spin up any new pod to check if the CNI, which is supposed to provide connectivity for the pod, is actually working as expected. Let’s run a simple test app called kuard.

kubectl run –restart=Never –image=gcr.io/kuar-demo/kuard-amd64:blue kuard pod/kuard created
pod/kuard created

Check the status of new created pod and verify if it is running and if an IP has been assigned.

kubectl get pod kuard -o wide
NAME    READY   STATUS    RESTARTS   AGE    IP           NODE            NOMINATED NODE   READINESS GATES
kuard   1/1     Running   0          172m   10.34.4.26   k8s-worker-05   <none>           <none>

Now forward the port the container is listening to a local port.

kubectl port-forward kuard 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Next open your browser at http://localhost:8080 and you should reach the kuard application that shows some information about the pod that can be used for test and troubleshooting purposes.

So far so good. It seems our CNI plugin is working as expected lets install extra tooling to interact with the CNI component.

Installing antctl tool to interact with Antrea

Antrea includes a nice method to interact with the CNI through cli commands using a tool called antctl. Antctl can be used in controller mode or in agent mode. For controller mode you can run the command externally or from whithin the controller pod using regular kubectl exec to issue the commands you need. For using antctl in agent mode you must run the commands from within an antrea agent pod.

To install antctl simply copy and paste the following commands to download a copy of the antctl prebuilt binary for your OS.

TAG=v1.10.0
curl -Lo ./antctl "https://github.com/antrea-io/antrea/releases/download/$TAG/antctl-$(uname)-x86_64" 
chmod +x ./antctl 
sudo mv ./antctl /usr/local/bin/antctl 

Now check installed antctl version.

antctl version
antctlVersion: v1.10.0
controllerVersion: v1.10.0

When antctl run out-of-cluster (for controller-mode only) it will look for your kubeconfig file to gain access to the antrea components. As an example, you can issue following command to check overall status.

antctl get controllerinfo
POD                                            NODE          STATUS  NETWORK-POLICIES ADDRESS-GROUPS APPLIED-TO-GROUPS CONNECTED-AGENTS
kube-system/antrea-controller-746dcd98d4-c6wcd k8s-worker-01 Healthy 0                0              0                 4

We can also display the activated features (aka featuregates) by using antctl get featuregates. To enable certain functionalities such as exporting flows for analytics or antrea proxy we need to change the values.yaml file and deploy a new antrea release through helm as we did before.

antctl get featuregates
Antrea Agent Feature Gates
FEATUREGATE              STATUS         VERSION   
FlowExporter             Disabled       ALPHA     
NodePortLocal            Enabled        BETA      
AntreaPolicy             Enabled        BETA      
AntreaProxy              Enabled        BETA      
Traceflow                Enabled        BETA      
NetworkPolicyStats       Enabled        BETA      
EndpointSlice            Disabled       ALPHA     
ServiceExternalIP        Disabled       ALPHA     
Egress                   Enabled        BETA      
AntreaIPAM               Disabled       ALPHA     
Multicast                Disabled       ALPHA     
Multicluster             Disabled       ALPHA     

Antrea Controller Feature Gates
FEATUREGATE              STATUS         VERSION   
AntreaPolicy             Enabled        BETA      
Traceflow                Enabled        BETA      
NetworkPolicyStats       Enabled        BETA      
NodeIPAM                 Disabled       ALPHA     
ServiceExternalIP        Disabled       ALPHA     
Egress                   Enabled        BETA      
Multicluster             Disabled       ALPHA    

On the other hand we can use the built-in antctl utility inside each of the antrea agent pods. For example, using kubectl get pods, obtain the name of the antrea-agent pod running on the node in which the kuard application we created before has been scheduled. Now open a shell using the following command (use your unique antrea-agent pod name for this).

kubectl exec -ti -n kube-system antrea-agent-6p856 — bash
root@k8s-worker-05:/# 

The prompt indicates you are inside the k8s-worker-05 node. Now you can interact with the antrea agent with several commands, as an example, get the information of the kuard podinterface using this command.

root@k8s-worker-05:/# antctl get podinterface kuard
NAMESPACE NAME  INTERFACE-NAME IP         MAC               PORT-UUID                            OF-PORT CONTAINER-ID
default   kuard kuard-652adb   10.34.4.26 46:b5:b9:c5:c6:6c 99c959f1-938a-4ee3-bcda-da05c1dc968a 17      4baee3d2974 

We will explore further advanced options with antctl later in this series of posts. Now the CNI is ready lets install a full featured microservices application.

Installing Acme-Fitness App

Throughout this series of posts we will use a reference kubernetes application to help us with some of the examples. We have chosen the popular acme-fitness created by Google because it is a good representation of a polyglot application based on microservices that includes some typical services of a e-commerce app such as front-end, catalog, cart, payment. We will use a version maintained by VMware that is available here. The following picture depicts how the microservices that conform the acme-fitness application are communication each other.

The first step is to clone the repo to have a local version of the git locally using the following command:

git clone https://github.com/vmwarecloudadvocacy/acme_fitness_demo.git
Cloning into 'acme_fitness_demo'...
remote: Enumerating objects: 765, done.
remote: Total 765 (delta 0), reused 0 (delta 0), pack-reused 765
Receiving objects: 100% (765/765), 1.01 MiB | 1.96 MiB/s, done.
Resolving deltas: 100% (464/464), done.

In order to get some of the database microservices running, we need to setup in advance some configurations (basically credentials) that will be injected as secrets into kubernetes cluster. Move to the cloned kubernetes-manifest folder and create a new file that will contain the required configurations that will be used as required credentials for some of the database microservices. Remember the password must be Base64 encoded. In my case I am using passw0rd so the base64 encoded form results in cGFzc3cwcmQK as shown below.

vi acme-fitness-secrets.yaml
# SECRETS FOR ACME-FITNESS (Plain text password is "passw0rd" in this example)
apiVersion: v1
data:
  password: cGFzc3cwcmQK
kind: Secret
metadata:
  name: cart-redis-pass
type: Opaque
---
apiVersion: v1
data:
  password: cGFzc3cwcmQK
kind: Secret
metadata:
  name: catalog-mongo-pass
type: Opaque
---
apiVersion: v1
data:
  password: cGFzc3cwcmQK
kind: Secret
metadata:
  name: order-postgres-pass
type: Opaque
---
apiVersion: v1
data:
  password: cGFzc3cwcmQK
kind: Secret
metadata:
  name: users-mongo-pass
type: Opaque
---
apiVersion: v1
data:
  password: cGFzc3cwcmQK
kind: Secret
metadata:
  name: users-redis-pass
type: Opaque

Now open the manifest named point-of-sales.yaml and set the value of the FRONTEND_HOST variable as per your particular setup. In this deployment we will install the acme-fitness application in a namespace called acme-fitness so the FQDN would be frontend.acme-fitness.svc.cluster.local. Adjust accordingly if you are using a domain different than cluster.local.

vi point-of-sales-total.yaml
      labels:
        app: acmefit
        service: pos
    spec:
      containers:
      - image: gcr.io/vmwarecloudadvocacy/acmeshop-pos:v0.1.0-beta
        name: pos
        env:
        - name: HTTP_PORT
          value: '7777'
        - name: DATASTORE
          value: 'REMOTE'
        - name: FRONTEND_HOST
          value: 'frontend.acme-fitness.svc.cluster.local'
        ports:
        - containerPort: 7777
          name: pos

Now create the namespace acme-fitness that is where we will place all microservices and related objects.

kubectl create ns acme-fitness

And now apply all the manifest in current directory kubernetes-manifest using the namespace as keyword as shown here to ensure all objects are place in this particular namespace. Ensure the acme-fitness-secrets.yaml manifest we created in previous step is also placed in the same directory and check if you get in the output the confirmation of the new secrets being created.

kubectl apply -f . -n acme-fitness
service/cart-redis created
deployment.apps/cart-redis created
service/cart created
deployment.apps/cart created
configmap/catalog-initdb-config created
service/catalog-mongo created
deployment.apps/catalog-mongo created
service/catalog created
deployment.apps/catalog created
service/frontend created
deployment.apps/frontend created
service/order-postgres created
deployment.apps/order-postgres created
service/order created
deployment.apps/order created
service/payment created
deployment.apps/payment created
service/pos created
deployment.apps/pos created
secret/cart-redis-pass created
secret/catalog-mongo-pass created
secret/order-postgres-pass created
secret/users-mongo-pass created
secret/users-redis-pass created
configmap/users-initdb-config created
service/users-mongo created
deployment.apps/users-mongo created
service/users-redis created
deployment.apps/users-redis created
service/users created
deployment.apps/users created

Wait a couple of minutes to allow workers to pull the images of the containers and you should see all the pods running. In case you find any pod showing a non-running status simply delete it and wait for kubernetes to renconcile the state till you see everything up and running.

kubectl get pods -n acme-fitness
NAME                              READY   STATUS      RESTARTS   AGE
cart-76bb4c586-blc2m              1/1     Running     0          2m45s
cart-redis-5cc665f5bd-zrjwh       1/1     Running     0          3m57s
catalog-958b9dc7c-qdzbt           1/1     Running     0          2m27s
catalog-mongo-b5d4bfd54-c2rh5     1/1     Running     0          2m13s
frontend-6cd56445-8wwws           1/1     Running     0          3m57s
order-584d9c6b44-fwcqv            1/1     Running     0          3m56s
order-postgres-8499dcf8d6-9dq8w   1/1     Running     0          3m57s
payment-5ffb9c8d65-g8qb2          1/1     Running     0          3m56s
pos-5995956dcf-xxmqr              1/1     Running     0          108s
users-67f9f4fb85-bntgv            1/1     Running     0          3m55s
users-mongo-6f57dbb4f-kmcl7       1/1     Running     0          45s
users-mongo-6f57dbb4f-vwz64       0/1     Completed   0          3m56s
users-redis-77968854f4-zxqgc      1/1     Running     0          3m56s

Now check the created services. The entry point of the acme-fitness application resides on the service named frontend that points to the frontend microservice. This service is a LoadBalancer type so, in case you have an application in your cluster to expose LoadBalancer type services, you should receive an external IP that will be used to reach our application externally. If this is not your case you can always access using the corresponding NodePort for the service. In this particular case the service is exposed in the dynamic 31967 port within the NodePort port range as shown below.

kubectl get svc -n acme-fitness
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
cart             ClusterIP      10.96.58.243     <none>         5000/TCP         4m38s
cart-redis       ClusterIP      10.108.33.107    <none>         6379/TCP         4m39s
catalog          ClusterIP      10.106.250.32    <none>         8082/TCP         4m38s
catalog-mongo    ClusterIP      10.104.233.60    <none>         27017/TCP        4m38s
frontend         LoadBalancer   10.111.94.177    10.113.3.110   80:31967/TCP     4m38s
order            ClusterIP      10.108.161.14    <none>         6000/TCP         4m37s
order-postgres   ClusterIP      10.99.98.123     <none>         5432/TCP         4m38s
payment          ClusterIP      10.99.108.248    <none>         9000/TCP         4m37s
pos              NodePort       10.106.130.147   <none>         7777:30431/TCP   4m37s
users            ClusterIP      10.106.170.197   <none>         8083/TCP         4m36s
users-mongo      ClusterIP      10.105.18.58     <none>         27017/TCP        4m37s
users-redis      ClusterIP      10.103.94.131    <none>         6379/TCP         4m37s

I am using the AVI Kubernetes Operator in my setup to watch por LoadBalancer type services, so we should be able to access the site using the IP address or the allocated name, that according to my particular settings here will be http://frontend.acme-fitness.avi.sdefinitive.net. If you are not familiar with AVI Ingress solution and how to deploy it you can check a series related posts here that explain step by step how to integrate this powerful enterprise solution with your cluster.

Acme-Fitness application Home Page

Now that the application is ready let’s deploy a traffic generator to inject some traffic.

Generating traffic using Locust testing tool

A simple but powerful way to generate synthetic traffic is using the Locust tool written in Python. An interesting advantage of Locust is that it has a distributed architecture that allows running multiple tests on multiple workers. It also includes a web interface that allows to start and parameterize the load test. Locust allows you to define complex test scenarios that are described through a locustfile.py. Thankfully the acme-fitness repository includes here a load-generator folder that contains instructions and a ready-to-use locustfile.py to fully simulate traffic and users according to the architecture of the acme-fitness application.

We can install the locust application in kubernetes easily. The first step is to create the namespace in which the load-gen application will be installed

kubectl create ns load-gen

As usual, if we want to inject configuration into kubernetes we can use a configmap object containing the mentioned the required settings (in this case in a form of locustfile.py file). Be careful because creating the .py file into a cm using kubectl create from-file might cause some formatting errors. I have created a functional configmap in yaml format file that you can use directly using following command.

kubectl apply -n load-gen -f https://raw.githubusercontent.com/jhasensio/antrea/main/LOCUST/acme-locustfile-cm.yaml
configmap/locust-configmap created

Once the configmap is created you can apply the following manifest that includes services and deployments to install the Locust load-testing tool in a distributed architecture and taking as input the locustfile.py file coded in a configmap named locust-configmap. Use following command to deploy Locust.

kubectl apply -n load-gen -f https://raw.githubusercontent.com/jhasensio/antrea/main/LOCUST/locust.yaml
deployment.apps/locust-master created
deployment.apps/locust-worker created
service/locust-master created
service/locust-ui created

You can inspect the created kubernetes objects in the load-gen namespace. Note how there is a LoadBalancer service that will be used to reach the Locust application.

kubectl get all -n load-gen
NAME                                 READY   STATUS    RESTARTS   AGE
pod/locust-master-67bdb5dbd4-ngtw7   1/1     Running   0          81s
pod/locust-worker-6c5f87b5c8-pgzng   1/1     Running   0          81s
pod/locust-worker-6c5f87b5c8-w5m6h   1/1     Running   0          81s

NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
service/locust-master   ClusterIP      10.110.189.42   <none>         5557/TCP,5558/TCP,8089/TCP   81s
service/locust-ui       LoadBalancer   10.106.18.26    10.113.3.111   80:32276/TCP                 81s

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/locust-master   1/1     1            1           81s
deployment.apps/locust-worker   2/2     2            2           81s

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/locust-master-67bdb5dbd4   1         1         1       81s
replicaset.apps/locust-worker-6c5f87b5c8   2         2         2       81s

Open a browser to access to Locust User Interface using the allocated IP at http://10.113.3.111 or the FQDN that in my case corresponds to http://locust-ui.load-gen.avi.sdefinitive.net and you will reach the following web site. If you don’t have any Load Balancer solution installed just go the allocated port (32276 in this case) using any of your the IP addresses of your nodes.

Locust UI

Now populate the Host text box using to start the test against the acme-fitness application. You can use the internal URL that will be reachable at http://frontend.acme-fitness.svc.cluster.local or the external name http://frontend.acme-fitness.avi.sdefinitive.net. Launch the test and observe how the Request Per Second counter is increasing progressively.

If you need extra load simply scale the locust-worker deployment and you would get more pods acting as workers available to generate traffic

We have the cluster up and running, the application and the load-generator ready. It’s time to start the journey to add observability using extra tools. Be sure to check next part of this series. Stay tuned!