Antrea is an opensource Kubernetes Networking and Security project maintaned by VMware that implements a Container Network Interface (CNI) to provide network connectivity and security for pod workloads. It has been designed with flexibility and high performance in mind and is based on Open vSwitch (OVS), a very mature project that stands out precisely because of these characteristics.
Antrea creates a SDN-like architecture separating the data plane (which is based in OVS) and the control plane. To do so, Antrea installs an agent component in each of the worker nodes to program the OVS datapath whereas a central controller running on the control plane node is be in charge of centralized tasks such as calculating network policies. The following picture that is available at main Antrea site depicts how it integrates with a kubernetes cluster.
This series of post will focus in how to provide observability capabilities to this CNI using different tools. This first article will explain how to install Antrea and some related management stuff and also how to create a testbed environment based on a microservices application and a load-generator tool that will be used as reference example throughout the next articles. Now we have a general overview of what Antrea is, let’s start by installing Antrea through Helm.
Installing Antrea using Helm
To start with Antrea, the first step is to create a kubernetes cluster. The process of setting up a kubernetes cluster is out of scope of this post. By the way, this series of articles are based on kubernetes vanilla but feel free to try with another distribution such as VMware Tanzu or Openshift. If you are using kubeadm to build your kubernetes cluster you need to start using –pod-network-cidr=<CIDR> parameter to enable the NodeIpamController in kubernetes, alternatively you can leverage some built-in Antrea IPAM capabilities.
The easiest way to verify that NodeIPAMController is operating in your existing cluster is by checking if the flag –-allocate-node-cidrs is set to true and if there is a cluster-cidr configured. Use the following command for verification.
"--allocate-node-cidrs=true",
"--cluster-cidr=10.34.0.0/16",
Is important to mention that in kubernetes versions prior to 1.24, the CNI plugin could be managed by kubelet and there was a requirement to ensure the kubelet was started with network-plugin=cni flag enabled, however in newer versions is the container runtime instead of the kubelet who is in charge of managing the CNI plugin. In my case I am using containerd 1.6.9 as container runtime and Kubernetes version 1.24.8.
To install the Antrea CNI you just need to apply the kubernetes manifest that specifies all the resources needed. The latest manifest is generally available at the official antrea github site here.
Alternatively, VMware also maintains a Helm chart available for this purpose. I will install Antrea using Helm here because it has some advantages. For example, in general when you enable a new featuregate you must manually restart the pods to actually applying new settings but, when using Helm, the new changes are applied and pods restarted as part of the new release deployment. There is however an important consideration on updating CRDs when using Helm. Make sure to check this note before updating Antrea.
Lets start with the installation process. Considering you have Helm already installed in the OS of the server you are using to manage your kubernetes cluster (otherwise complete installation as per official doc here), the first step would be to add the Antrea repository source as shown below.
"antrea" has been added to your repositories
As a best practice update the repo before installing to ensure you work with the last available version of the chart.
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "antrea" chart repository
You can also explore the contents of the added repository. Note there is not only a chart for Antrea itself but also other interesting charts that will be explore in next posts. At the time of writing this post the last Antrea version is 1.10.0.
NAME CHART VERSION APP VERSION DESCRIPTION
antrea/antrea 1.10.0 1.10.0 Kubernetes networking based on Open vSwitch
antrea/flow-aggregator 1.10.0 1.10.0 Antrea Flow Aggregator
antrea/theia 0.4.0 0.4.0 Antrea Network Flow Visibility
When using Helm, you can customize the installation of your chart by means of templates in the form of yaml files that contain all configurable settings. To figure out what settings are available for a particular chart a good idea is to write in a default_values.yaml file that would contain all the accepted values using the helm show values command as you can see here.
helm show values antrea/antrea >> default_values.yaml
Using the default_values.yaml file as a reference, you can change any of the default values to meet your requirements. Note that, as expected for a default configuration file, any configuration not explicitily referenced in the values file will use default settings. We will use a very simplified version of the values file with just a simple setting to specifiy the desired tag or version for our deployment. We will add extra configuration later when enabling some features. Create a simple values.yaml file with this content.
# -- Container image to use for Antrea components.
image:
tag: "v1.10.0"
And now install the new chart using helm install and the created values.yaml file as input.
NAME: antrea
LAST DEPLOYED: Fri Jan 13 18:52:48 2023
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Antrea CNI has been successfully installed
You are using version 1.10.0
For the Antrea documentation, please visit https://antrea.io
After couple of minutes you should be able to see the pods in Running state. The Antrea agents are deployed using a daemonSet and thus they run an independent pod on every single node in the cluster. On the other hand the Antrea controller component is installed as a single replica deployment and may run by default on any node.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
antrea-agent-52cjk 2/2 Running 0 86s 10.113.2.17 k8s-worker-03 <none> <none>
antrea-agent-549ps 2/2 Running 0 86s 10.113.2.10 k8s-contol-plane-01 <none> <none>
antrea-agent-5kvhb 2/2 Running 0 86s 10.113.2.15 k8s-worker-01 <none> <none>
antrea-agent-6p856 2/2 Running 0 86s 10.113.2.19 k8s-worker-05 <none> <none>
antrea-agent-f75b7 2/2 Running 0 86s 10.113.2.16 k8s-worker-02 <none> <none>
antrea-agent-m6qtc 2/2 Running 0 86s 10.113.2.18 k8s-worker-04 <none> <none>
antrea-agent-zcnd7 2/2 Running 0 86s 10.113.2.20 k8s-worker-06 <none> <none>
antrea-controller-746dcd98d4-c6wcd 1/1 Running 0 86s 10.113.2.10 k8s-contol-plane-01 <none> <none>
Your nodes should now transit to Ready status as a result of the CNI sucessfull installation.
NAME STATUS ROLES AGE VERSION
k8s-contol-plane-01 Ready control-plane 46d v1.24.8
k8s-worker-01 Ready <none> 46d v1.24.8
k8s-worker-02 Ready <none> 46d v1.24.8
k8s-worker-03 Ready <none> 46d v1.24.8
k8s-worker-04 Ready <none> 46d v1.24.8
k8s-worker-05 Ready <none> 46d v1.24.8
k8s-worker-06 Ready <none> 46d v1.24.8
Jump into any of the nodes using ssh. If Antrea CNI plugin is successfully installed you should see a new configuration file in the /etc/cni/net.d directory with a content similiar to this.
{
"cniVersion":"0.3.0",
"name": "antrea",
"plugins": [
{
"type": "antrea",
"ipam": {
"type": "host-local"
}
}
,
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
,
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}
As a last verification, spin up any new pod to check if the CNI, which is supposed to provide connectivity for the pod, is actually working as expected. Let’s run a simple test app called kuard.
pod/kuard created
Check the status of new created pod and verify if it is running and if an IP has been assigned.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kuard 1/1 Running 0 172m 10.34.4.26 k8s-worker-05 <none> <none>
Now forward the port the container is listening to a local port.
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Next open your browser at http://localhost:8080 and you should reach the kuard application that shows some information about the pod that can be used for test and troubleshooting purposes.
So far so good. It seems our CNI plugin is working as expected lets install extra tooling to interact with the CNI component.
Installing antctl tool to interact with Antrea
Antrea includes a nice method to interact with the CNI through cli commands using a tool called antctl. Antctl can be used in controller mode or in agent mode. For controller mode you can run the command externally or from whithin the controller pod using regular kubectl exec to issue the commands you need. For using antctl in agent mode you must run the commands from within an antrea agent pod.
To install antctl simply copy and paste the following commands to download a copy of the antctl prebuilt binary for your OS.
TAG=v1.10.0
curl -Lo ./antctl "https://github.com/antrea-io/antrea/releases/download/$TAG/antctl-$(uname)-x86_64"
chmod +x ./antctl
sudo mv ./antctl /usr/local/bin/antctl
Now check installed antctl version.
antctlVersion: v1.10.0
controllerVersion: v1.10.0
When antctl run out-of-cluster (for controller-mode only) it will look for your kubeconfig file to gain access to the antrea components. As an example, you can issue following command to check overall status.
POD NODE STATUS NETWORK-POLICIES ADDRESS-GROUPS APPLIED-TO-GROUPS CONNECTED-AGENTS
kube-system/antrea-controller-746dcd98d4-c6wcd k8s-worker-01 Healthy 0 0 0 4
We can also display the activated features (aka featuregates) by using antctl get featuregates. To enable certain functionalities such as exporting flows for analytics or antrea proxy we need to change the values.yaml file and deploy a new antrea release through helm as we did before.
Antrea Agent Feature Gates
FEATUREGATE STATUS VERSION
FlowExporter Disabled ALPHA
NodePortLocal Enabled BETA
AntreaPolicy Enabled BETA
AntreaProxy Enabled BETA
Traceflow Enabled BETA
NetworkPolicyStats Enabled BETA
EndpointSlice Disabled ALPHA
ServiceExternalIP Disabled ALPHA
Egress Enabled BETA
AntreaIPAM Disabled ALPHA
Multicast Disabled ALPHA
Multicluster Disabled ALPHA
Antrea Controller Feature Gates
FEATUREGATE STATUS VERSION
AntreaPolicy Enabled BETA
Traceflow Enabled BETA
NetworkPolicyStats Enabled BETA
NodeIPAM Disabled ALPHA
ServiceExternalIP Disabled ALPHA
Egress Enabled BETA
Multicluster Disabled ALPHA
On the other hand we can use the built-in antctl utility inside each of the antrea agent pods. For example, using kubectl get pods, obtain the name of the antrea-agent pod running on the node in which the kuard application we created before has been scheduled. Now open a shell using the following command (use your unique antrea-agent pod name for this).
root@k8s-worker-05:/#
The prompt indicates you are inside the k8s-worker-05 node. Now you can interact with the antrea agent with several commands, as an example, get the information of the kuard podinterface using this command.
NAMESPACE NAME INTERFACE-NAME IP MAC PORT-UUID OF-PORT CONTAINER-ID
default kuard kuard-652adb 10.34.4.26 46:b5:b9:c5:c6:6c 99c959f1-938a-4ee3-bcda-da05c1dc968a 17 4baee3d2974
We will explore further advanced options with antctl later in this series of posts. Now the CNI is ready lets install a full featured microservices application.
Installing Acme-Fitness App
Throughout this series of posts we will use a reference kubernetes application to help us with some of the examples. We have chosen the popular acme-fitness created by Google because it is a good representation of a polyglot application based on microservices that includes some typical services of a e-commerce app such as front-end, catalog, cart, payment. We will use a version maintained by VMware that is available here. The following picture depicts how the microservices that conform the acme-fitness application are communication each other.
The first step is to clone the repo to have a local version of the git locally using the following command:
Cloning into 'acme_fitness_demo'...
remote: Enumerating objects: 765, done.
remote: Total 765 (delta 0), reused 0 (delta 0), pack-reused 765
Receiving objects: 100% (765/765), 1.01 MiB | 1.96 MiB/s, done.
Resolving deltas: 100% (464/464), done.
In order to get some of the database microservices running, we need to setup in advance some configurations (basically credentials) that will be injected as secrets into kubernetes cluster. Move to the cloned kubernetes-manifest folder and create a new file that will contain the required configurations that will be used as required credentials for some of the database microservices. Remember the password must be Base64 encoded. In my case I am using passw0rd so the base64 encoded form results in cGFzc3cwcmQK as shown below.
# SECRETS FOR ACME-FITNESS (Plain text password is "passw0rd" in this example)
apiVersion: v1
data:
password: cGFzc3cwcmQK
kind: Secret
metadata:
name: cart-redis-pass
type: Opaque
---
apiVersion: v1
data:
password: cGFzc3cwcmQK
kind: Secret
metadata:
name: catalog-mongo-pass
type: Opaque
---
apiVersion: v1
data:
password: cGFzc3cwcmQK
kind: Secret
metadata:
name: order-postgres-pass
type: Opaque
---
apiVersion: v1
data:
password: cGFzc3cwcmQK
kind: Secret
metadata:
name: users-mongo-pass
type: Opaque
---
apiVersion: v1
data:
password: cGFzc3cwcmQK
kind: Secret
metadata:
name: users-redis-pass
type: Opaque
Now open the manifest named point-of-sales.yaml and set the value of the FRONTEND_HOST variable as per your particular setup. In this deployment we will install the acme-fitness application in a namespace called acme-fitness so the FQDN would be frontend.acme-fitness.svc.cluster.local. Adjust accordingly if you are using a domain different than cluster.local.
labels:
app: acmefit
service: pos
spec:
containers:
- image: gcr.io/vmwarecloudadvocacy/acmeshop-pos:v0.1.0-beta
name: pos
env:
- name: HTTP_PORT
value: '7777'
- name: DATASTORE
value: 'REMOTE'
- name: FRONTEND_HOST
value: 'frontend.acme-fitness.svc.cluster.local'
ports:
- containerPort: 7777
name: pos
Now create the namespace acme-fitness that is where we will place all microservices and related objects.
kubectl create ns acme-fitness
And now apply all the manifest in current directory kubernetes-manifest using the namespace as keyword as shown here to ensure all objects are place in this particular namespace. Ensure the acme-fitness-secrets.yaml manifest we created in previous step is also placed in the same directory and check if you get in the output the confirmation of the new secrets being created.
service/cart-redis created
deployment.apps/cart-redis created
service/cart created
deployment.apps/cart created
configmap/catalog-initdb-config created
service/catalog-mongo created
deployment.apps/catalog-mongo created
service/catalog created
deployment.apps/catalog created
service/frontend created
deployment.apps/frontend created
service/order-postgres created
deployment.apps/order-postgres created
service/order created
deployment.apps/order created
service/payment created
deployment.apps/payment created
service/pos created
deployment.apps/pos created
secret/cart-redis-pass created
secret/catalog-mongo-pass created
secret/order-postgres-pass created
secret/users-mongo-pass created
secret/users-redis-pass created
configmap/users-initdb-config created
service/users-mongo created
deployment.apps/users-mongo created
service/users-redis created
deployment.apps/users-redis created
service/users created
deployment.apps/users created
Wait a couple of minutes to allow workers to pull the images of the containers and you should see all the pods running. In case you find any pod showing a non-running status simply delete it and wait for kubernetes to renconcile the state till you see everything up and running.
NAME READY STATUS RESTARTS AGE
cart-76bb4c586-blc2m 1/1 Running 0 2m45s
cart-redis-5cc665f5bd-zrjwh 1/1 Running 0 3m57s
catalog-958b9dc7c-qdzbt 1/1 Running 0 2m27s
catalog-mongo-b5d4bfd54-c2rh5 1/1 Running 0 2m13s
frontend-6cd56445-8wwws 1/1 Running 0 3m57s
order-584d9c6b44-fwcqv 1/1 Running 0 3m56s
order-postgres-8499dcf8d6-9dq8w 1/1 Running 0 3m57s
payment-5ffb9c8d65-g8qb2 1/1 Running 0 3m56s
pos-5995956dcf-xxmqr 1/1 Running 0 108s
users-67f9f4fb85-bntgv 1/1 Running 0 3m55s
users-mongo-6f57dbb4f-kmcl7 1/1 Running 0 45s
users-mongo-6f57dbb4f-vwz64 0/1 Completed 0 3m56s
users-redis-77968854f4-zxqgc 1/1 Running 0 3m56s
Now check the created services. The entry point of the acme-fitness application resides on the service named frontend that points to the frontend microservice. This service is a LoadBalancer type so, in case you have an application in your cluster to expose LoadBalancer type services, you should receive an external IP that will be used to reach our application externally. If this is not your case you can always access using the corresponding NodePort for the service. In this particular case the service is exposed in the dynamic 31967 port within the NodePort port range as shown below.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cart ClusterIP 10.96.58.243 <none> 5000/TCP 4m38s
cart-redis ClusterIP 10.108.33.107 <none> 6379/TCP 4m39s
catalog ClusterIP 10.106.250.32 <none> 8082/TCP 4m38s
catalog-mongo ClusterIP 10.104.233.60 <none> 27017/TCP 4m38s
frontend LoadBalancer 10.111.94.177 10.113.3.110 80:31967/TCP 4m38s
order ClusterIP 10.108.161.14 <none> 6000/TCP 4m37s
order-postgres ClusterIP 10.99.98.123 <none> 5432/TCP 4m38s
payment ClusterIP 10.99.108.248 <none> 9000/TCP 4m37s
pos NodePort 10.106.130.147 <none> 7777:30431/TCP 4m37s
users ClusterIP 10.106.170.197 <none> 8083/TCP 4m36s
users-mongo ClusterIP 10.105.18.58 <none> 27017/TCP 4m37s
users-redis ClusterIP 10.103.94.131 <none> 6379/TCP 4m37s
I am using the AVI Kubernetes Operator in my setup to watch por LoadBalancer type services, so we should be able to access the site using the IP address or the allocated name, that according to my particular settings here will be http://frontend.acme-fitness.avi.sdefinitive.net. If you are not familiar with AVI Ingress solution and how to deploy it you can check a series related posts here that explain step by step how to integrate this powerful enterprise solution with your cluster.
Now that the application is ready let’s deploy a traffic generator to inject some traffic.
Generating traffic using Locust testing tool
A simple but powerful way to generate synthetic traffic is using the Locust tool written in Python. An interesting advantage of Locust is that it has a distributed architecture that allows running multiple tests on multiple workers. It also includes a web interface that allows to start and parameterize the load test. Locust allows you to define complex test scenarios that are described through a locustfile.py. Thankfully the acme-fitness repository includes here a load-generator folder that contains instructions and a ready-to-use locustfile.py to fully simulate traffic and users according to the architecture of the acme-fitness application.
We can install the locust application in kubernetes easily. The first step is to create the namespace in which the load-gen application will be installed
kubectl create ns load-gen
As usual, if we want to inject configuration into kubernetes we can use a configmap object containing the mentioned the required settings (in this case in a form of locustfile.py file). Be careful because creating the .py file into a cm using kubectl create from-file might cause some formatting errors. I have created a functional configmap in yaml format file that you can use directly using following command.
configmap/locust-configmap created
Once the configmap is created you can apply the following manifest that includes services and deployments to install the Locust load-testing tool in a distributed architecture and taking as input the locustfile.py file coded in a configmap named locust-configmap. Use following command to deploy Locust.
deployment.apps/locust-master created
deployment.apps/locust-worker created
service/locust-master created
service/locust-ui created
You can inspect the created kubernetes objects in the load-gen namespace. Note how there is a LoadBalancer service that will be used to reach the Locust application.
NAME READY STATUS RESTARTS AGE
pod/locust-master-67bdb5dbd4-ngtw7 1/1 Running 0 81s
pod/locust-worker-6c5f87b5c8-pgzng 1/1 Running 0 81s
pod/locust-worker-6c5f87b5c8-w5m6h 1/1 Running 0 81s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/locust-master ClusterIP 10.110.189.42 <none> 5557/TCP,5558/TCP,8089/TCP 81s
service/locust-ui LoadBalancer 10.106.18.26 10.113.3.111 80:32276/TCP 81s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/locust-master 1/1 1 1 81s
deployment.apps/locust-worker 2/2 2 2 81s
NAME DESIRED CURRENT READY AGE
replicaset.apps/locust-master-67bdb5dbd4 1 1 1 81s
replicaset.apps/locust-worker-6c5f87b5c8 2 2 2 81s
Open a browser to access to Locust User Interface using the allocated IP at http://10.113.3.111 or the FQDN that in my case corresponds to http://locust-ui.load-gen.avi.sdefinitive.net and you will reach the following web site. If you don’t have any Load Balancer solution installed just go the allocated port (32276 in this case) using any of your the IP addresses of your nodes.
Now populate the Host text box using to start the test against the acme-fitness application. You can use the internal URL that will be reachable at http://frontend.acme-fitness.svc.cluster.local or the external name http://frontend.acme-fitness.avi.sdefinitive.net. Launch the test and observe how the Request Per Second counter is increasing progressively.
If you need extra load simply scale the locust-worker deployment and you would get more pods acting as workers available to generate traffic
We have the cluster up and running, the application and the load-generator ready. It’s time to start the journey to add observability using extra tools. Be sure to check next part of this series. Stay tuned!