In this series of posts I have tried to create a comprehensive guide with a good level of detail around the installation of Antrea as CNI along with complimentary mainstream tools necessary for the visualization of metrics and logs. The guide is composed of the following modules:

If you are one of those who like the hard-way and like to understand how different pieces work together or if you are thinking in deploying in a more demanding enviroment that requires scalability and performance, then I highly recommend taking the long path and going through the post series in which you will find some cool stuff to gain understanding of this architecture.

Alternatively, if you want to take the short path, just continue reading the following section to deploy Antrea and related observability stack using a very simple architecture that is well suited for demo and non-production environments.

Quick Setup Guide

A basic requirement before moving forwared is to have a kubernetes cluster up and running. If you are reading this guide you probably are already familiar with how to setup a kubernetes cluster so I won’t spent time describing the procedure. Once the cluster is ready and you can interact with it via kubectl command you are ready to go. The first step is to get the Antrea CNI installed with some particular configuration enabled. The installation of Antrea is very straightforward. As docummented in Antrea website you just need to apply the yaml manifest as shown below and you will end up with your CNI deployed and a fully functional cluster prepared to run pod workloads.

kubectl apply -f https://raw.githubusercontent.com/antrea-io/antrea/main/build/yamls/antrea.yml
customresourcedefinition.apiextensions.k8s.io/antreaagentinfos.crd.antrea.io created
customresourcedefinition.apiextensions.k8s.io/antreacontrollerinfos.crd.antrea.io created
customresourcedefinition.apiextensions.k8s.io/clustergroups.crd.antrea.io created
<skipped>

mutatingwebhookconfiguration.admissionregistration.k8s.io/crdmutator.antrea.io created
validatingwebhookconfiguration.admissionregistration.k8s.io/crdvalidator.antrea.io created

Once Antrea is deployed, edit the configmap that contains the configuration settings in order to enable some required features related to observability.

kubectl edit configmaps -n kube-system antrea-config

Scroll down and locate FlowExporter setting and ensure is set to true and the line is uncomment to take effect in the configuration.

# Enable flowexporter which exports polled conntrack connections as IPFIX flow records from each
# agent to a configured collector.
      FlowExporter: true

Locate also the enablePrometheusMetrics keyword and ensure is enabled and uncommented as well. This setting should be enabled by default but double check just in case.

# Enable metrics exposure via Prometheus. Initializes Prometheus metrics listener.
    enablePrometheusMetrics: true

Last step would be to restart the antrea agent daemonSet to ensure the new settings are applied.

kubectl rollout restart ds/antrea-agent -n kube-system

Now we are ready to go. Clone my git repo here that contains all required stuff to proceed with the observability stack installation.

git clone https://github.com/jhasensio/antrea.git
Cloning into 'antrea'...
remote: Enumerating objects: 155, done.
remote: Counting objects: 100% (155/155), done.
remote: Compressing objects: 100% (86/86), done.
remote: Total 155 (delta 75), reused 145 (delta 65), pack-reused 0
Receiving objects: 100% (155/155), 107.33 KiB | 1.18 MiB/s, done.
Resolving deltas: 100% (75/75), done

Create the observability namespace that will be used to place all the needed kubernetes objects.

kubectl create ns observability
namespace/observability created

Navigate to the antrea/antrea-observability-quick-start folder and now install all the required objects by applying recursively the yaml files in the folder tree using the mentioned namespace. As you can tell from the output below, the manifest will deploy a bunch of tools including fluent-bit, loki and grafana with precreated dashboards and datasources.

kubectl apply -f tools/ –recursive -n observability
clusterrole.rbac.authorization.k8s.io/fluent-bit created
clusterrolebinding.rbac.authorization.k8s.io/fluent-bit created
configmap/fluent-bit created
daemonset.apps/fluent-bit created
service/fluent-bit created
serviceaccount/fluent-bit created
clusterrole.rbac.authorization.k8s.io/grafana-clusterrole created
clusterrolebinding.rbac.authorization.k8s.io/grafana-clusterrolebinding created
configmap/grafana-config-dashboards created
configmap/grafana created
configmap/1--agent-process-metrics-dashboard-cm created
configmap/2--agent-ovs-metrics-and-logs-dashboard-cm created
configmap/3--agent-conntrack-metrics-dashboard-cm created
configmap/4--agent-network-policy-metrics-and-logs-dashboard-cm created
configmap/5--antrea-agent-logs-dashboard-cm created
configmap/6--controller-metrics-and-logs-dashboard-cm created
configmap/loki-datasource-cm created
configmap/prometheus-datasource-cm created
deployment.apps/grafana created
role.rbac.authorization.k8s.io/grafana created
rolebinding.rbac.authorization.k8s.io/grafana created
secret/grafana created
service/grafana created
serviceaccount/grafana created
configmap/loki created
configmap/loki-runtime created
service/loki-memberlist created
serviceaccount/loki created
service/loki-headless created
service/loki created
statefulset.apps/loki created
clusterrole.rbac.authorization.k8s.io/prometheus-antrea created
serviceaccount/prometheus created
secret/prometheus-service-account-token created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
configmap/prometheus-server-conf created
deployment.apps/prometheus-deployment created
service/prometheus-service created

The grafana UI is exposed through a LoadBalancer service-type object. If you have a load balancer solution deployed in your cluster, use the allocated external IP Address, otherwise use the dynamically allocated NodePort if you are able to reach the kubernetes nodes directly or ultimately use the port-forward from your administrative console as shown below.

kubectl port-forward -n observability services/grafana 8888:80
Forwarding from 127.0.0.1:8888 -> 3000

Open your browser at localhost 8888 and you will access the Grafana login page as depicted in following picture.

Grafana Login Page

Enter the default username admin. The password is stored in a secret object with a superstrong password only for demo purposes that you can easly decode.

kubectl get secrets -n observability grafana -o jsonpath=”{.data.admin-password}” | base64 -d ; echo
passw0rd

Now you should reach the home page of Grafana application with a similar aspect of what you see below. By default Grafana uses dark theme but you can easily change. You just have to click on the settings icon at the left column > Preferences and select Light as UI Thene. Find at the end of the post some dashboard samples with both themes.

Grafana home page

Now, click on the General link at the top left corner and you should be able to reach the six precreated dashboards in the folder.

Grafana Dashboard browsing

Unless your have a productive cluster running hundreds of workloads, you will find some of the panels empty because there is no metrics nor logs to display yet. Just to add some fun to the dashboards I have created a script to simulate a good level of activity in the cluster. The script is also included in the git repo so just execute it and wait for some minutes.

bash simulate_activity.sh
 This simulator will create some activity in your cluster using current context
 It will spin up some client and a deployment-backed ClusterIP service based on apache application
 After that it will create random AntreaNetworkPolicies and AntreaClusterNetworkPolicies and it will generate some traffic with random patterns
 It will also randomly scale in and out the deployment during execution. This is useful for demo to see all metrics and logs showing up in the visualization tool

For more information go to https://sdefinitive.net

   *** Starting activity simulation. Press CTRL+C to stop job ***   
....^CCTRL+C Pressed...

Cleaning temporary objects, ACNPs, ANPs, Deployments, Services and Pods
......
Cleaning done! Thanks for using it!

After some time, you can enjoy the colorful dashboards and logs specially designed to get some key indicators from Antrea deployment. Refer to Part 3 of this post series for extra information and use cases. Find below some supercharged dashboards and panels as a sample of what you will get.

Dashboard 1: Agent Process Metrics
Dashboard 2: Agent OVS Metrics and Logs
Dashboard 3: Agent Conntrack Metrics
Dashboard 4: Agent Network Policy Metric and Logs (1/2) (Light theme)
Dashboard 4: Agent Network Policy Metric and Logs (2/2) (Light theme)
Dashboard 5: Antrea Agent Logs
Dashboard 6: Controller Metrics and Logs (1/2)

Keep reading next post in this series to get more information around dashboards usage and how to setup all the tools with data persistence and with a high-available and distributed architecture.