by jhasensio

Tag: AVI

AVI for K8s Part 10: Customizing Infrastructure Settings

Without a doubt, the integration provided by AKO provides fantastic automation capabilities that accelerate the roll-out of kubernetes-based services through an enterprise-grade ADC. Until now the applications created in kubernetes interacting with the kubernetes API through resources such as ingress or loadbalancer had their realization on the same common infrastructure implemented through the data layer of the NSX Advanced Load Balancer, that is, the service engines. However, it could be interesting to gain some additional control over the infrastructure resources that will ultimately implement our services. For example, we may be interested in that certain services use premium high-performance resources or a particular high availability scheme or even a specific placement in the network for regulatory security aspects for some critical business applications. On the other hand other less important or non productive services could use a less powerful and/or highly oversubscribed resources.

The response of the kubernetes community to cover this need for specific control of the infrastructure for services in kubernetes has materialized in project called Gateway API. Gateway API (formerly known as Service API) is an open source project that brings up a collection of resources such as GatewayClass, Gateway, HTTPRoute, TCPRoute… etc that is being adopted by many verdors and have broad industry support. If you want to know more about Gateway API you can explore the official project page here.

Before the arrival of Gateway API, AVI used annotations to express extra configuration but since the Gateway API is more standard and widely adopted method AVI has included the support for this new API since version 1.4.1 and will probably become the preferred method to express this configuration.

On the other hand AKO supports networking/v1 ingress API, that was released for general availability starting with Kubernetes version 1.19. Specifically AKO supports IngressClass and DefaultIngressClass networking/v1 ingress features.

The combination of both “standard” IngressClass along with Gateway API resources is the foundation to add the custom infrastructure control. When using Ingress resources we can take advantage of the existing IngressClasses objects whereas when using LoadBalancer resources we would need to resort to the Gateway API.

Exploring AviInfraSettings CRD for infrastructure customization

On startup, AKO automatically detects whether ingress-class API is enabled/available in the cluster it is operating in. If the ingress-class api is enabled, AKO switches to use the IngressClass objects, instead of the previously long list of custom annotation whenever you wanted to express custom configuration.

If your kubernetes cluster supports IngressClass resources you should be able to see the created AVI ingressclass object as shown below. It is a cluster scoped resource and receives the name avi-lb that point to the AKO ingress controller. Note also that the object receives automatically an annotation ingressclass.kubernetes.io/is-default-class set to true. This annotation will ensure that new Ingresses without an ingressClassName specified will be assigned this default IngressClass. 

kubectl get ingressclass -o yaml
apiVersion: v1
items:
- apiVersion: networking.k8s.io/v1
  kind: IngressClass
  metadata:
    annotations:
      ingressclass.kubernetes.io/is-default-class: "true"
      meta.helm.sh/release-name: ako-1622125803
      meta.helm.sh/release-namespace: avi-system
    creationTimestamp: "2021-05-27T14:30:05Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: avi-lb
    resourceVersion: "11588616"
    uid: c053a284-6dba-4c39-a8d0-a2ea1549e216
  spec:
    controller: ako.vmware.com/avi-lb
    parameters:
      apiGroup: ako.vmware.com
      kind: IngressParameters
      name: external-lb

A new AKO CRD called AviInfraSetting will help us to express the configuration needed in order to achieve segregation of virtual services that might have properties based on different underlying infrastructure components such as Service Engine Group, network names among others. The general AviInfraSetting definition is showed below.

apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
metadata:
  name: my-infra-setting
spec:
  seGroup:
    name: compact-se-group
  network:
    names:
      - vip-network-10-10-10-0-24
    enableRhi: true
  l7Settings:
    shardSize: MEDIUM

As showed in the below diagram the Ingress object will define an ingressClassName specification that points to the IngressClass object. In the same way the IngressClass object will define a series of parameters under spec section to refer to the AviInfraSetting CRD.

AVI Ingress Infrastructure Customization using IngressClasses

For testing purposes we will use the hello kubernetes service. First create the deployment, service and ingress resource using yaml file below. It is assumed that an existing secret named hello-secret is already in place to create the secure ingress service.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello-kubernetes
        image: paulbouwer/hello-kubernetes:1.7
        ports:
        - containerPort: 8080
        env:
        - name: MESSAGE
          value: "MESSAGE: Critical App Running here!!"
---
apiVersion: v1
kind: Service
metadata:
  name: hello
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: hello
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello
  labels:
    app: hello
    app: gslb
spec:
  tls:
  - hosts:
    - kuard.avi.iberia.local
    secretName: hello-secret
  rules:
    - host: hello.avi.iberia.local
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: hello
              port:
                number: 8080

After pushing the declarative file above to the Kubernetes API by using kubectl apply command you should be able to access to the application service just by browsing to the host, in my case https://hello.avi.iberia.local. I have created a custom message by passing an environment variable in the deployment definition with the text you can read below.

Now that we have our test service ready, let’s start testing each of the customization options for the infrastructure.

seGroup

The first parameter we are testing is in charge of selecting the Service Engine Group. Remember Service Engines (e.g. our dataplane) are created with a set of attributes inherited from a Service Engine Group, which contains the definition of how the SEs should be sized, placed, and made highly available. The seGroup parameter defines the service Engine Group that will be used by services that points to this particular AviInfraSettings CRD object. By default, any Ingress or LoadBalancer objects created by AKO will use the SE-Group as specified in the values.yaml that define general AKO configuration.

In order for AKO to make use of this configuration, the first step is to create a new Service Engine Group definition in the controller via GUI / API. In this case, let’s imagine that this group of service engines will be used by applications that demand an active-active high availability mode in which the services will run on at least two service engines to minimize the impact in the event of a failure of one of them. From AVI GUI go to Infrastructure > Service Engine Group > Create. Assign a new name such as SE-Group-AKO-CRITICAL that will be used by the AviInfraSettings object later and configure the Active/Active Elastic HA scheme with a minimum of 2 Service Engine by Virtual Service in the Scale per Virtual Service Setting as shown below:

Active-Active SE Group definition for critical Virtual Service

Now we will create the AviInfraSetting object with the following specification. Write below content in a file and apply it using kubectl apply command.

apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
metadata:
  name: critical.infra
spec:
  seGroup:
    name: SE-Group-AKO-CRITICAL

Once created you can verify your new AviInfraSetting-type object by exploring the resource using kubectl commands. In this case our new created object is named critical.infra. To show the complete object definition use below kubectl get commands as usual:

kubectl get AviInfraSetting critical.infra -o yaml
apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ako.vmware.com/v1alpha1","kind":"AviInfraSetting","metadata":{"annotations":{},"name":"critical.infra"},"spec":{"seGroup":{"name":"SE-Group-AKO-CRITICAL"}}}
  creationTimestamp: "2021-05-27T15:12:50Z"
  generation: 1
  name: critical.infra
  resourceVersion: "11592607"
  uid: 27ef1502-5a91-4244-a23b-96bb8ffd9a6e
spec:
  seGroup:
    name: SE-Group-AKO-CRITICAL
status:
  status: Accepted

Now we want to attach this infra setup to our ingress service. To do so, we need to create our IngressClass object first. This time, instead of writing a new yaml file and applying, we will use the stdin method as shown below. After the EOF string you can press enter and the pipe will send the content of the typed yaml file definition to the kubectl apply -f command. An output message should confirm that the new IngressClass object has been successfully created.

cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: critical-ic
spec:
  controller: ako.vmware.com/avi-lb
  parameters:
    apiGroup: ako.vmware.com
    kind: AviInfraSetting
    name: critical.infra
EOF


Output:
ingressclass.networking.k8s.io/critical-ingress-class createdku

Once we have created an IngressClass that maps to the critical.infra AviInfraSetting object, it’s time to instruct the Ingress object that defines the external access to our application to use that particular ingress class. Simply edit the existing Ingress object previously created and add the corresponging ingressClass definition under the spec section.

kubectl edit ingress hello 
apiVersion: networking.k8s.io/v1
kind: Ingress
<< Skipped >>
spec:
  ingressClassName: critical-ic
  rules:
  - host: hello.avi.iberia.local
    http:
<< Skipped >>

Once applied AKO will send an API call to the AVI Controller to reconcile the new expressed desired state. That also might include the creation of new Service Engines elements in the infrastructure if there were not any previous active Service Engine in that group as in my case. In this particular case a new pair of Service Engine must be created to fullfill the Active/Active high-availability scheme as expressed in the Service Engine definition. You can tell how a new Shared object with the name S1-AZ1–Shared-L7-critical-infra-4 has been created and will remain as unavailable until the cloud infrastructure (vCenter in my case) complete the automated service engine creation.

c

This image has an empty alt attribute; its file name is image-74.png
New Parent L7 VS shared object and service Engine Creation

After some minutes you should be able to see the new object in a yellow state that will become green eventually after some time. The yellow color can be an indication of the VS dependency with an early created infrastructure Service Engines as in our case. Note how our VS rely on two service engines as stated in the Service Engine Group definition fo HA.

New Parent L7 VS shared object with two Service Engine in Active/Active architecture

The hello Ingress object shows a mapping with the critical-ic ingressClass object we defined for this particular service.

kubectl get ingress
NAME    CLASS         HOSTS                    ADDRESS        PORTS     AGE
ghost   avi-lb        ghost.avi.iberia.local   10.10.15.162   80, 443   9d
hello   critical-ic   hello.avi.iberia.local   10.10.15.161   80, 443   119m
httpd   avi-lb        httpd.avi.iberia.local   10.10.15.166   80, 443   6d
kuard   avi-lb        kuard.avi.iberia.local   10.10.15.165   80, 443   6d5h

network

Next configuration element we can customize as part of the AviInfraSetting definition is the network. This can help to determine which network pool will be used for a particular group of services in our kubernetes cluster. As in previous examples, to allow the DevOps operator to consume certain AVI related settings, we need to define it first as part of the AVI infrastructure operator role.

To create a new FrontEnd pool to expose our new services simply define a new network and allocate some IPs for Service Engine and Virtual Service Placement. In my case the network has been automatically discovered as part of the cloud integration with vSphere. We just need to define the corresponding static pools for both VIPs and Service Engines to allow the internal IPAM to assign IP addresses when needed.

New network definition for custom VIP placement

Once the new network is defined, we can use the AviInfraSetting CRD to point to the new network name. In my case the name the assigned name is REGA_EXT_UPLINKB_3016. Since the CRD object is already created the easiest way to change this setting is simply edit and add the new parameter under spec section as shown below:

kubectl edit aviinfrasetting critical.infra 
apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
  name: critical.infra
spec:
  seGroup:
    name: SE-Group-AKO-CRITICAL
  network:
    names:
    - REGA_EXT_UPLINKB_3016

After writing and exiting from the editor (vim by default) the new file is applied with the changes. You can see in the AVI Controller GUI how the new config change is reflected with the engine icon in the Analytics page indicating the VS has received a new configuration. If you expand the CONFIG_UPDATE event you can see how a new change in the network has ocurred and now the VS will used the 10.10.16.224 IP address to be reachable from the outside.

Change of VIP Network trhough AKO as seen in AVI GUI

NOTE.- In my case, after doing the change I noticed the Ingress object will still showed the IP Address assigned at the creation time and the new real value wasn’t updated.

kubectl get ingress
NAME    CLASS         HOSTS                    ADDRESS        PORTS     AGE
ghost   avi-lb        ghost.avi.iberia.local   10.10.15.162   80, 443   9d
hello   critical-ic   hello.avi.iberia.local   10.10.15.161   80, 443   119m
httpd   avi-lb        httpd.avi.iberia.local   10.10.15.166   80, 443   6d
kuard   avi-lb        kuard.avi.iberia.local   10.10.15.165   80, 443   6d5h

If this is your case, simple delete and recreate the ingress object with the corresponding ingress-class and you should see the new IP populated

kubectl get ingress
NAME    CLASS         HOSTS                    ADDRESS        PORTS     AGE
ghost   avi-lb        ghost.avi.iberia.local   10.10.15.162   80, 443   9d
hello   critical-ic   hello.avi.iberia.local   10.10.16.224   80, 443   119m
httpd   avi-lb        httpd.avi.iberia.local   10.10.15.166   80, 443   6d
kuard   avi-lb        kuard.avi.iberia.local   10.10.15.165   80, 443   6d5h

enableRhi

As mentioned before, AVI is able to place the same Virtual Service in different Service Engines. This is very helpful for improving the high-availiablity of the application by avoiding a single point of failure and also for scaling out our application to gain extra capacity. AVI has a native AutoScale capability that selects a primary service engine within a group that is in charge of coordinating the distribution of the virtual service traffic among the other SEs where a particular Virtual Service is also active.

Whilst the native AutoScaling method is based on L2 redirection, an alternative and more scalable and efficient method for scaling a virtual service is to rely on Border Gateway Protocol (BGP) and specifically in a feature that is commonly named as route health injection (RHI) to provide equal cost multi-path (ECMP) reachability from the upstream router towards the application. Using Route Health Injection (RHI) with ECMP for virtual service scaling avoids the extra burden on the primary SE to coordinate the scaled out traffic among the SEs.

To leverage this feature, as in previous examples, is it a pre-task of the LoadBalancer and/or network administrator to define the network peering with the underlay BGP network. case so you need to select a local Autonomous System Number (5000 in my case) and declare the IP of the peers that will be used to establish BGP sessions to interchange routing information to reach the corresponding Virtual Services IP addresses. The upstream router in this case in 10.10.16.250 and belongs to ASN 6000 so an eBGP peering would be in place.

The following diagram represent the topology I am using here to implement the required BGP setup.

AVI network topology to enable BGP RHI for L3 Scaling Out using ECMP

You need to define a BGP configuration at AVI Controller with some needed attributes as shown in the following table

SettingValueComment
BGP AS5000Local ASN number used for eBGP
Placement NetworkREGA_EXT_UPLINKB_3016Network used to reach external BGP peers
IPv4 Prefix10.10.16.0/24Subnet that will be used for external announces
IPv4 Peer10.10.16.250IP address of the external BGP Peer
Remote AS6000Autonomous System Number the BGP peer belongs to
Multihop0TTL Setting for BGP control traffic. Adjust if the peer is located some L3 hops beyond
BFDEnabledBidirectional Forwarding Detection mechanism
Advertise VIPEnabledAnnounce allocated VS VIP as Route Health Injection
Advertise SNATEnabledAnnounce allocated Service Engine IP address used as source NAT. Useful in one arm deployments to ensure returning traffic from backends.
BGP Peering configuration to enable RHI

The above configuration settings are shown in the following configuration screen at AVI Controller:

AVI Controller BGP Peering configuration

As a reference I am using in my example a Cisco CSR 1000V as external upstream router that will act as BGP neigbor. The upstream router needs to know in advance the IP addresses of the neighbors in order to configure the BGP peering statements. Some BGP implementations has the capability to define dynamic BGP peering using a range of IP addresses and that fits very well with an autoscalable fabric in which the neighbors might appears and dissappears automatically as the traffic changes. You would also need to enable the ECMP feature adjusting the maximum ECMP paths to the maximum SE configured in your Service Engine Group. Below you can find a sample configuration leveraging the BGP Dynamic Neighbor feature and BFD for fast convergence.

!!! enable BGP using dynamic neighbors

router bgp 6000
 bgp log-neighbor-changes
 bgp listen range 10.10.16.192/27 peer-group AVI-PEERS
 bgp listen limit 32
 neighbor AVI-PEERS peer-group
 neighbor AVI-PEERS remote-as 5000
 neighbor AVI-PEERS fall-over bfd
 !
 address-family ipv4
  neighbor AVI-PEERS activate
  maximum-paths eibgp 10
 exit-address-family
!
!! Enable BFD for fast convergence
interface gigabitEthernet3
   ip address 10.10.16.250 255.255.255.0
   no shutdown
   bfd interval 50 min_rx 50 multiplier 5

As you can see below, once the AVI controller configuration is completed you should see the neighbor status by issuing the show ip bgp summary command. The output is shown below. Notice how two dynamic neighborships has been created with 10.10.16.192 and 10.10.16.193 which correspond to the allocated IP addresses for the two new service engines used to serve our hello Virtual Service. Note also in the State/PfxRcd column that no prefixes has been received yet.

csr1000v-ecmp-upstream#sh ip bgp summary
BGP router identifier 22.22.22.22, local AS number 6000
BGP table version is 2, main routing table version 2
1 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 280 bytes of memory
1 BGP AS-PATH entries using 40 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 704 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
*10.10.16.192   4         5000       2       4        2    0    0 00:00:11        0
*10.10.16.193   4         5000       2       4        2    0    0 00:00:11        0
* Dynamically created based on a listen range command
Dynamically created neighbors: 2, Subnet ranges: 2

BGP peergroup AVI-PEERS listen range group members:
  10.10.16.192/27

Total dynamically created neighbors: 2/(32 max), Subnet ranges: 1

If you want to check or troubleshoot the BGP from the Service Engine, you can always use the CLI to see the runtime status of the BGP peers. Since this is a distributed architecture, the BGP daemon runs locally on each of the service engine that conform the Service Engine Group. To access to the service engine login in in the AVI Controller via SSH and open a shell session.

admin@10-10-10-33:~$ shell
Login: admin
Password: <password>

Now attach the desired service engine. I am picking one of the recently created service engines

attach serviceengine s1_ako_critical-se-idiah
Warning: Permanently added '[127.1.0.7]:5097' (ECDSA) to the list of known hosts.

Avi Service Engine

Avi Networks software, Copyright (C) 2013-2017 by Avi Networks, Inc.
All rights reserved.

Version:      20.1.5
Date:         2021-04-15 07:08:29 UTC
Build:        9148
Management:   10.10.10.46/24                 UP
Gateway:      10.10.10.1                     UP
Controller:   10.10.10.33                    UP


The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

Use the ip netns command to show the network namespace within the Service Engine

admin@s1-ako-critical-se-idiah:~$ ip netns
avi_ns1 (id: 0)

And now open a bash shell session in the correspoding network namespace. In this case we are using the default avi_ns1 network namespace at the Service Engine. The prompt should change after entering the proper credentials

admin@s1-ako-critical-se-idiah:~$ sudo ip netns exec avi_ns1 bash
[sudo] password for admin: <password> 
root@s1-ako-critical-se-idiah:/home/admin#

Open a session to the internal FRR-based BGP router daemon issuing a netcat localhost bgpd command as shown below

root@s1-ako-critical-se-idiah:/home/admin# netcat localhost bgpd

Hello, this is FRRouting (version 7.0).
Copyright 1996-2005 Kunihiro Ishiguro, et al.


User Access Verification

▒▒▒▒▒▒"▒▒Password: avi123

s1-ako-critical-se-idiah>

Use enable command to gain privileged access and show running configuration. The AVI Controller has created automatically a configuration to peer with our external router at 10.10.16.250. Some route maps to filter inbound and outbound announces has also been populated as seen below

s1-ako-critical-se-idiah# show run
show run

Current configuration:
!
frr version 7.0
frr defaults traditional
!
hostname s1-ako-critical-se-idiah
password avi123
log file /var/lib/avi/log/bgp/avi_ns1_bgpd.log
!
!
!
router bgp 5000
 bgp router-id 2.61.174.252
 no bgp default ipv4-unicast
 neighbor 10.10.16.250 remote-as 6000
 neighbor 10.10.16.250 advertisement-interval 5
 neighbor 10.10.16.250 timers connect 10
 !
 address-family ipv4 unicast
  neighbor 10.10.16.250 activate
  neighbor 10.10.16.250 route-map PEER_RM_IN_10.10.16.250 in
  neighbor 10.10.16.250 route-map PEER_RM_OUT_10.10.16.250 out
 exit-address-family
!
!
ip prefix-list def-route seq 5 permit 0.0.0.0/0
!
route-map PEER_RM_OUT_10.10.16.250 permit 10
 match ip address 1
 call bgp_properties_ebgp_rmap
!
route-map bgp_community_rmap permit 65401
!
route-map bgp_properties_ibgp_rmap permit 65400
 match ip address prefix-list snat_vip_v4-list
 call bgp_community_rmap
!
route-map bgp_properties_ibgp_rmap permit 65401
 call bgp_community_rmap
!
route-map bgp_properties_ebgp_rmap permit 65400
 match ip address prefix-list snat_vip_v4-list
 call bgp_community_rmap
!
route-map bgp_properties_ebgp_rmap permit 65401
 call bgp_community_rmap
!
line vty
!
end

To verify neighbor status use show bgp summary command

s1-ako-critical-se-idiah# sh bgp summary
sh bgp summary

IPv4 Unicast Summary:
BGP router identifier 2.61.174.252, local AS number 5000 vrf-id 0
BGP table version 6
RIB entries 0, using 0 bytes of memory
Peers 1, using 22 KiB of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
10.10.16.250    4       6000      29      25        0    0    0 00:23:29            0

Total number of neighbors 1

Note that by default the AVI BGP implementation filters any prefix coming from the external upstream router, therefore BGP is mainly used to inject RHI to allow outside world to gain VS reachability. Once the network is ready we can use the enableRhi setting in our custom AviInfraSetting object to enable this capability. Again the easiest way is by editing the existing our critical.infra AviInfraSetting object using kubectl edit as shown below

kubectl edit AviInfraSetting critical.infra
apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
metadata:
  name: critical.infra
spec:
  network:
    enableRhi: true
    names:
    - REGA_EXT_UPLINKB_3016
  seGroup:
    name: SE-Group-AKO-CRITICAL

Before applying the new configuration, enable console messages (term mon) in case you are accesing the external router by SSH and activate the debugging of any ip routing table changes using debug ip routing you would be able to see the route announcements received by the upstream router. Now apply the above setting by editing the critical.infra AviInfraSetting CRD object.

csr1000v-ecmp-upstream#debug ip routing
IP routing debugging is on
*May 27 16:42:50.115: RT: updating bgp 10.10.16.224/32 (0x0) [local lbl/ctx:1048577/0x0]  :
    via 10.10.16.192   0 1048577 0x100001

*May 27 16:42:50.115: RT: add 10.10.16.224/32 via 10.10.16.192, bgp metric [20/0]
*May 27 16:42:50.128: RT: updating bgp 10.10.16.224/32 (0x0) [local lbl/ctx:1048577/0x0]  :
    via 10.10.16.193   0 1048577 0x100001
    via 10.10.16.192   0 1048577 0x100001

*May 27 16:42:50.129: RT: closer admin distance for 10.10.16.224, flushing 1 routes
*May 27 16:42:50.129: RT: add 10.10.16.224/32 via 10.10.16.193, bgp metric [20/0]
*May 27 16:42:50.129: RT: add 10.10.16.224/32 via 10.10.16.192, bgp metric [20/0]

As you can see above new messages appears indicating a new announcement of VIP network at 10.10.16.224/32 has been received by both 10.10.16.193 and 10.10.16.192 neighbors and the event showing the new equal paths routs has been installed in the routing table. In fact, if you check the routing table for this particular prefix.

csr1000v-ecmp-upstream#sh ip route 10.10.16.224
Routing entry for 10.10.16.224/32
  Known via "bgp 6000", distance 20, metric 0
  Tag 5000, type external
  Last update from 10.10.16.192 00:00:46 ago
  Routing Descriptor Blocks:
  * 10.10.16.193, from 10.10.16.193, 00:00:46 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 5000
      MPLS label: none
    10.10.16.192, from 10.10.16.192, 00:00:46 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 5000
      MPLS label: none

You can even see the complete IP routing table with a more familiar command as shown below:

csr1000v-ecmp-upstream#show ip route
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR

Gateway of last resort is 10.10.15.1 to network 0.0.0.0

B*    0.0.0.0/0 [20/0] via 10.10.15.1, 00:48:21
      10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
C        10.10.16.0/24 is directly connected, GigabitEthernet3
B        10.10.16.224/32 [20/0] via 10.10.16.194, 00:02:10
                         [20/0] via 10.10.16.193, 00:02:10
                         [20/0] via 10.10.16.192, 00:02:10
L        10.10.16.250/32 is directly connected, GigabitEthernet3

Remember we also enabled the Bidirectional Forwarding Detection (BFD) in our BGP peering configuration. The BFD protocol is a simple hello mechanism that detects failures in a network. Hello packets are sent at a specified, regular interval. A neighbor failure is detected when the routing device stops receiving a reply after a specified interval. BFD works with a wide variety of network environments and topologies and is used in combination with BGP to provide faster failure detection. The current status of the BFD neighbors can be also seen in the upstream router console. A Local and Remote Discrimination ID (LD/RD column) are asigned to uniquely indentify the BFD peering across the network.

csr1000v-ecmp-upstream#show bfd neighbors

IPv4 Sessions
NeighAddr            LD/RD         RH/RS     State     Int
10.10.16.192         4104/835186103  Up        Up        Gi3
10.10.16.193         4097/930421219  Up        Up        Gi3

To verify the Route Health Injection works as expect we can now manually scale out our service to create an aditional Service Engine, that means the hello application should now be active and therefore reachable through three different equal cost paths. Hover the mouse over the Parent Virtual Service of the hello application and press the Scale Out button

Manual Scale out for the parent VS

A new window pops up indicating a new service engine is being created to complete the manual Scale Out operation.

Scaling Out progress

After a couple of minutes you should see how the service is now running in three independent Service which means we have increased the overall capacity for our service engine.

Scaling out a VS to three Service Engines

At the same time, in the router console we can see a set of events indicating the new BGP and BFD peering creation with new Service Engine at 10.10.16.194. After just one second a new route is announced through this new peering and also installed in the routing table.

csr1000v-ecmp-upstream#
*May 27 17:07:02.515: %BFD-6-BFD_SESS_CREATED: BFD-SYSLOG: bfd_session_created, neigh 10.10.16.194 proc:BGP, idb:GigabitEthernet3 handle:3 act
*May 27 17:07:02.515: %BGP-5-ADJCHANGE: neighbor *10.10.16.194 Up
*May 27 17:07:02.531: %BFDFSM-6-BFD_SESS_UP: BFD-SYSLOG: BFD session ld:4098 handle:3 is going UP

*May 27 17:07:03.478: RT: updating bgp 10.10.16.224/32 (0x0) [local lbl/ctx:1048577/0x0]  :
    via 10.10.16.194   0 1048577 0x100001
    via 10.10.16.193   0 1048577 0x100001
    via 10.10.16.192   0 1048577 0x100001

*May 27 17:07:03.478: RT: closer admin distance for 10.10.16.224, flushing 2 routes
*May 27 17:07:03.478: RT: add 10.10.16.224/32 via 10.10.16.194, bgp metric [20/0]
*May 27 17:07:03.478: RT: add 10.10.16.224/32 via 10.10.16.193, bgp metric [20/0]
*May 27 17:07:03.478: RT: add 10.10.16.224/32 via 10.10.16.192, bgp metric [20/0]

In we inject some traffic in the VS we could verify how this mechanism is distributing traffic acrross the three Service Engine. Note that the upstream router uses a 5-tuple (Source IP, Destination IP, Source Port, Destination Port and Protocol) hashing in its selection algorithm to determine the path among the available equal cost paths for any given new flow. That means any flow will be always sticked to the same path, or in other words, you need some network entropy if you want to achieve a fair distribution scheme among available paths (i.e Service Engines).

Traffic distribution across Service Engine using GUI Analytics

Our new resulting topology is shown in the following diagram. Remember you can add extra capacing by scaling out again the VS using the manual method as described before or even configure the AutoRebalance to automatically adapt to the traffic or Service Engine health conditions.

Resulting BGP topology after manual Scale Out operation

shardSize

A common problem with traditional LoadBalancers deployment methods is that, for each new application (Ingress), a new separate VIP is created, resulting in a large number of routable addresses being required. You can find also more conservative approaches with a single VIP for all Ingresses but this way also may have their own issues related to stability and scaling.

AVI proposes a method to are automatically shard new ingress across a small number of VIPs offering best of both methods of deployment. The number of shards is configurable according to the shardSize. The shardSize defines the number of VIPs that will be used for the new ingresses and are described in following list:

  • LARGE: 8 shared VIPs
  • MEDIUM: 4 shared VIPs
  • SMALL: 1 shared VIP
  • DEDICATED: 1 non-shared Virtual Service

If not specified it uses the shardSize value provided in the values.yaml that by default is set to LARGE. The decision of selecting one of these sizes for Shard virtual service depends on the size of the Kubernetes cluster’s ingress requirements. It is recommended to always go with the highest possible Shard virtual service number that is(LARGE) to take into consideration future growing. Note, you need to adapt the number of available IPs for new services to match with the configured shardSize. For example you cannnot use a pool of 6 IPs for a LARGE shardSize since a minimum of eight would be required to create the set of needed Virtual Services to share the VIPs for new ingress. If the lenght of the available pool is less than the shardSize some of the ingress would fail. Let’s go through the different settings and check how it changes the way AKO creates the parent objects.

kubectl edit AviInfraSetting critical.infra
apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
metadata:
  name: critical.infra
spec:
  network:
    enableRhi: true
    names:
    - REGA_EXT_UPLINKB_3016
  seGroup:
    name: SE-Group-AKO-CRITICAL
  l7Settings:
    shardSize: LARGE

To test how the ingress are distributed to the shared Virtual Services I have created a simple script that creates a loop to produce dummy ingress services over for a given ClusterIP service. The script is available here. Let’s create a bunch of 20 new ingresses to see how it works.

./dummy_ingress.sh 20 apply hello
ingress.networking.k8s.io/hello1 created
service/hello1 created
ingress.networking.k8s.io/hello2 created
service/hello2 created
<skipped>
...
service/hello20 created
ingress.networking.k8s.io/hello20 created

Using kubectl and some filtering and sorting keywords to display only the relevant information you can see in the output below how AVI Controller uses up to eight different VS/IPs ranging from 10.10.16.225 to 10.10.16.232 to accomodate the created ingress objects.

kubectl get ingress --sort-by='.status.loadBalancer.ingress[0].ip' -o='custom-columns=HOSTNAME:.status.loadBalancer.ingress[0].hostname,AVI-VS-IP:.status.loadBalancer.ingress[0].ip'
HOSTNAME                   AVI-VS-IP
hello14.avi.iberia.local   10.10.16.225
hello1.avi.iberia.local    10.10.16.225
hello9.avi.iberia.local    10.10.16.225
hello2.avi.iberia.local    10.10.16.226
hello17.avi.iberia.local   10.10.16.226
hello3.avi.iberia.local    10.10.16.227
hello16.avi.iberia.local   10.10.16.227
hello4.avi.iberia.local    10.10.16.228
hello11.avi.iberia.local   10.10.16.228
hello19.avi.iberia.local   10.10.16.228
hello10.avi.iberia.local   10.10.16.229
hello18.avi.iberia.local   10.10.16.229
hello5.avi.iberia.local    10.10.16.229
hello13.avi.iberia.local   10.10.16.230
hello6.avi.iberia.local    10.10.16.230
hello20.avi.iberia.local   10.10.16.231
hello15.avi.iberia.local   10.10.16.231
hello8.avi.iberia.local    10.10.16.231
hello7.avi.iberia.local    10.10.16.232
hello12.avi.iberia.local   10.10.16.232

As you can see in the AVI GUI, up to eight new VS has been created that will be used to distribute the new ingresses

Shared Virtual Services using a LARGE shardSize (8 shared VS)
Virtual Service showing several pools that uses same VIP

Now let’s change the AviInfraSetting object and set the shardSize to MEDIUM. You would probably need to reload the AKO controller to make this new change. Once done you can see how the distribution has now changed and the ingress are being distributed to a set of four VIPs ranging now from 10.10.16.225 to 10.10.16.228.

kubectl get ingress --sort-by='.status.loadBalancer.ingress[0].ip' -o='custom-columns=HOSTNAME:.status.loadBalancer.ingress[0].hostname,AVI-VS-IP:.status.loadBalancer.ingress[0].ip'
HOSTNAME                   AVI-VS-IP
hello14.avi.iberia.local   10.10.16.225
hello10.avi.iberia.local   10.10.16.225
hello5.avi.iberia.local    10.10.16.225
hello1.avi.iberia.local    10.10.16.225
hello18.avi.iberia.local   10.10.16.225
hello9.avi.iberia.local    10.10.16.225
hello6.avi.iberia.local    10.10.16.226
hello17.avi.iberia.local   10.10.16.226
hello13.avi.iberia.local   10.10.16.226
hello2.avi.iberia.local    10.10.16.226
hello16.avi.iberia.local   10.10.16.227
hello12.avi.iberia.local   10.10.16.227
hello7.avi.iberia.local    10.10.16.227
hello3.avi.iberia.local    10.10.16.227
hello15.avi.iberia.local   10.10.16.228
hello11.avi.iberia.local   10.10.16.228
hello4.avi.iberia.local    10.10.16.228
hello20.avi.iberia.local   10.10.16.228
hello8.avi.iberia.local    10.10.16.228
hello19.avi.iberia.local   10.10.16.228

You can verify how now only four Virtual Services are available and only four VIPs will be used to expose our ingresses objects.

Shared Virtual Services using a MEDIUM shardSize (4 shared VS)

The smaller the shardSize, the higher the density of ingress per VIP as you can see in the following screenshot

Virtual Service showing a higher number of pools that uses same VIP

If you use the SMALL shardSize then you would see how all the applications will use a single external VIP.

kubectl get ingress --sort-by='.status.loadBalancer.ingress[0].ip' -o='custom-columns=HOSTNAME:.status.loadBalancer.ingress[0].hostname,AVI-VS-IP:.status.loadBalancer.ingress[0].ip'
HOSTNAME                   AVI-VS-IP
hello1.avi.iberia.local    10.10.16.225
hello10.avi.iberia.local   10.10.16.225
hello11.avi.iberia.local   10.10.16.225
hello12.avi.iberia.local   10.10.16.225
hello13.avi.iberia.local   10.10.16.225
hello14.avi.iberia.local   10.10.16.225
hello15.avi.iberia.local   10.10.16.225
hello16.avi.iberia.local   10.10.16.225
hello17.avi.iberia.local   10.10.16.225
hello18.avi.iberia.local   10.10.16.225
hello19.avi.iberia.local   10.10.16.225
hello2.avi.iberia.local    10.10.16.225
hello20.avi.iberia.local   10.10.16.225
hello3.avi.iberia.local    10.10.16.225
hello4.avi.iberia.local    10.10.16.225
hello5.avi.iberia.local    10.10.16.225
hello6.avi.iberia.local    10.10.16.225
hello7.avi.iberia.local    10.10.16.225
hello8.avi.iberia.local    10.10.16.225
hello9.avi.iberia.local    10.10.16.225

You can verify how now a single Virtual Service is available and therefore a single VIPs will be used to expose our ingresses objects.

Shared Virtual Services using a SMALL shardSize (1 single shared VS)

The last option for the shardSize is DEDICATED, that, in fact disable the VIP sharing and creates a new VIP for any new ingress object. First delete the twenty ingresses/services we created before using the same script but now with the delete keyword as shown below.

./dummy_ingress.sh 20 delete hello
service "hello1" deleted
ingress.networking.k8s.io "hello1" deleted
service "hello2" deleted
ingress.networking.k8s.io "hello2" deleted
service "hello3" deleted
<Skipped>
...
service "hello20" deleted
ingress.networking.k8s.io "hello20" deleted

Now let’s create five new ingress/services using again the custom script.

./dummy_ingress.sh 5 apply hello
service/hello1 created
ingress.networking.k8s.io/hello1 created
service/hello2 created
ingress.networking.k8s.io/hello2 created
service/hello3 created
ingress.networking.k8s.io/hello3 created
service/hello4 created
ingress.networking.k8s.io/hello4 created
service/hello5 created
ingress.networking.k8s.io/hello5 created

As you can see, now a new IP address is allocated for any new service so there is no VIP sharing in place

kubectl get ingress --sort-by='.status.loadBalancer.ingress[0].ip' -o='custom-columns=HOSTNAME:.status.loadBalancer.ingress[0].hostname,AVI-VS-IP:.status.loadBalancer.ingress[0].ip'
HOSTNAME                  AVI-VS-IP
hello5.avi.iberia.local   10.10.16.225
hello1.avi.iberia.local   10.10.16.226
hello4.avi.iberia.local   10.10.16.227
hello3.avi.iberia.local   10.10.16.228
hello2.avi.iberia.local   10.10.16.229

You can verify in the GUI how a new VS is created. The name used for the VS indicates this is using a dedicated sharing scheme for this particular ingress.

Virtual Services using a DEDICATED shardSize (1 new dedicated VS per new ingress)

Remember you can use custom AviInfraSetting objects option to selectively set the shardSize according to your application needs.

Gateway API for customizing L4 LoadBalancer resources

As mentioned before, to provide some customized information for a particular L4 LoadBalancer resource we need to switch to the services API. To allow AKO to use Gateway API we need to enable it using one of the configuration settings in the values.yaml file that is used by the helm chart we use to deploy the AKO component.

servicesAPI: true 
# Flag that enables AKO in services API mode: https://kubernetes-sigs.github.io/service-apis/

Set the servicesAPI flag to true and redeploy the AKO release. You can use this simple ako_reload.sh script that you can find here to delete and recreate the existing ako release automatically after changing the above flag

./ako_reload.sh
"ako" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "ako" chart repository
Update Complete. ⎈Happy Helming!⎈
release "ako-1621926773" uninstalled
NAME: ako-1622125803
LAST DEPLOYED: Thu May 27 14:30:05 2021
NAMESPACE: avi-system
STATUS: deployed
REVISION: 1

The AKO implementation uses following Gateway API resources

  • GatewayClass.– used to aggregate a group of Gateway object. Used to point to some specific parameters of the load balancing implementation. AKO identifies GatewayClasses that points to ako.vmware.com/avi-lb as the controller.
  • Gateway.- defines multiple services as backends. It uses matching labels to select the Services that need to be implemented in the actual load balancing solution

Above diagram summarizes the different objects and how they map togheter:

AVI LoadBalancer Infrastructure Customization using Gateway API and labels matching

Let’s start by creating a new GatewayClass type object as defined in the following yaml file. Save in a yaml file or simply paste the following code using stdin.

cat <<EOF | kubectl apply -f -
apiVersion: networking.x-k8s.io/v1alpha1
kind: GatewayClass
metadata:
  name: critical-gwc
spec:
  controller: ako.vmware.com/avi-lb
  parametersRef:
    group: ako.vmware.com
    kind: AviInfraSetting
    name: critical.infra
EOF

Output: 
gatewayclass.networking.x-k8s.io/critical-gateway-class created

Now define Gateway object including the labels we will use to select the application we are using as backend. Some backend related parameters such as protocol and port needs to be defined. The gatewayClassName defined previously is also referred using the spec.gatewayClassName key.

cat <<EOF | kubectl apply -f -
apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: avi-alb-gw
  namespace: default
spec: 
  gatewayClassName: critical-gwc    
  listeners: 
  - protocol: TCP 
    port: 80 
    routes: 
      selector: 
       matchLabels: 
        ako.vmware.com/gateway-namespace: default 
        ako.vmware.com/gateway-name: avi-alb-gw
      group: v1 
      kind: Service
EOF

Output:
gateway.networking.x-k8s.io/avi-alb-gw created

As soon as we create the GW resource, AKO will call the AVI Controller to create this new object even when there are now actual services associated yet. In the AVI GUI you can see how the service is created and it takes the name of the gateway resource. This is a namespace scoped resource so you should be able to create the same gateway definition in a different namespace. A new IP address has been selected from the AVI IPAM as well.

Virtual Service for Gateway resource used for LoadBalancer type objects.

Now we can define the LoadBalancer service. We need to add the corresponding labels as they are used to link the backend to the gateway. Use the command below that also includes the deployment declaration for our service.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: hello
  labels:
    ako.vmware.com/gateway-namespace: default 
    ako.vmware.com/gateway-name: avi-alb-gw
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: hello
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello-kubernetes
        image: paulbouwer/hello-kubernetes:1.7
        ports:
        - containerPort: 8080
        env:
        - name: MESSAGE
          value: "MESSAGE: Running in port 8080!!"
EOF

Once applied, the AKO will translate the new changes in the Gateway API related objects and will call the AVI API to patch the corresponding Virtual Service object according to the new settings. In this case the Gateway external IP is allocated as seen in the following output.

kubectl get service
NAME         TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)        AGE
hello        LoadBalancer   10.106.124.101   10.10.16.227   80:30823/TCP   9s
kubernetes   ClusterIP      10.96.0.1        <none>         443/TCP        99d

You can explore the AVI GUI to see how the L4 Load Balancer object has been realized in a Virtual Service.

L4 Virtual Service realization using AKO and Gateway API

And obviously we can browse to the external IP address and check if the service is actually running and is reachable from the outside.

An important benefit of this is the ability to share the same external VIP for exposing different L4 services outside. You can easily add a new listener definition that will expose the port TCP 8080 and will point to the same backend hello application as shown below:

cat <<EOF | kubectl apply -f -
apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: avi-alb-gw
  namespace: default
spec: 
  gatewayClassName: critical-gwc    
  listeners: 
  - protocol: TCP 
    port: 8080 
    routes: 
      selector: 
       matchLabels: 
        ako.vmware.com/gateway-namespace: default 
        ako.vmware.com/gateway-name: avi-alb-gw
      group: v1 
      kind: Service
  - protocol: TCP 
    port: 80 
    routes: 
      selector: 
       matchLabels: 
        ako.vmware.com/gateway-namespace: default 
        ako.vmware.com/gateway-name: avi-alb-gw
      group: v1 
      kind: Service
EOF

Describe the new gateway object to see the status of the resource

kubectl describe gateway avi-alb-gw
Name:         avi-alb-gw
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networking.x-k8s.io/v1alpha1
Kind:         Gateway
Metadata:
<Skipped>
Status:
  Addresses:
    Type:   IPAddress
    Value:  10.10.16.227
  Conditions:
    Last Transition Time:  1970-01-01T00:00:00Z
    Message:               Waiting for controller
    Reason:                NotReconciled
    Status:                False
    Type:                  Scheduled
  Listeners:
    Conditions:
      Last Transition Time:  2021-06-03T08:30:11Z
      Message:
      Reason:                Ready
      Status:                True
      Type:                  Ready
    Port:                    8080
    Protocol:
    Conditions:
      Last Transition Time:  2021-06-03T08:30:11Z
      Message:
      Reason:                Ready
      Status:                True
      Type:                  Ready
    Port:                    80
    Protocol:
Events:                      <none>

And the kubectl get services shows the same external IP address is being shared

kubectl get services
NAME         TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
hello        LoadBalancer   10.106.124.101   10.10.16.227   80:31464/TCP     35m
hello8080    LoadBalancer   10.107.152.148   10.10.16.227   8080:31990/TCP   7m23s

The AVI GUI represents the new Virtual Service object with two different Pool Groups as shown in the screen capture below

AVI Virtual Service object representation of the Gateway resource

And you can see how the same Virtual Service is proxying both 8080 and 80 TCP ports simultaneously

Virtual Service object including two listeners

It could be interesting for predictibility reasons to be able to pick an specific IP address from the available range instead of use the AVI IPAM automated allocation process. You can spec the desired IP address by including the spec.addresses definition as part of the gateway object configuration. To change the IPAddress a complete gateway recreation is required. First delete the gateway

kubectl delete gateway avi-alb-gw
gateway.networking.x-k8s.io "avi-alb-gw" deleted

And now recreate it adding the addresses definition as shown below

cat <<EOF | kubectl apply -f -
apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: avi-alb-gw
  namespace: default
spec: 
  gatewayClassName: critical-gwc
  addresses:
  - type: IPAddress
    value: 10.10.16.232  
  listeners: 
  - protocol: TCP 
    port: 8080 
    routes: 
      selector: 
       matchLabels: 
        ako.vmware.com/gateway-namespace: default 
        ako.vmware.com/gateway-name: avi-alb-gw
      group: v1 
      kind: Service
  - protocol: TCP 
    port: 80 
    routes: 
      selector: 
       matchLabels: 
        ako.vmware.com/gateway-namespace: default 
        ako.vmware.com/gateway-name: avi-alb-gw
      group: v1 
      kind: Service
EOF

From the AVI GUI you can now see how the selected IP Address as been configured in our Virtual Service that maps with the Gateway kubernetes resource.

L4 Virtual Service realization using AKO and Gateway API

This concludes this article. Stay tuned for new content.

AVI for K8s Part 9: Customizing Ingress Pools using HTTPRule CRDs

In the previous article we went through the different available options to add extra customization for our delivered applications using the HostRule CRD on the top of the native kubernetes objects.

Now it’s time to explore another interesint CRD called HTTPRule that can be used as a complimentary object that dictates the treatment applied to the traffic sent towards the backend servers. We will tune some key properties to control configuration settings such as load balancing algorithm, persistence, health-monitoring or re-encryption.

Exploring the HTTPRule CRD

The HTTPRule CRD general definition looks like this:

apiVersion: ako.vmware.com/v1alpha1
kind: HTTPRule
metadata:
   name: my-http-rule
   namespace: purple-l7
spec:
  fqdn: foo.avi.internal
  paths:
  - target: /foo
    healthMonitors:
    - my-health-monitor-1
    - my-health-monitor-2
    loadBalancerPolicy:
      algorithm: LB_ALGORITHM_CONSISTENT_HASH
      hash: LB_ALGORITHM_CONSISTENT_HASH_SOURCE_IP_ADDRESS
    tls: ## This is a re-encrypt to pool
      type: reencrypt # Mandatory [re-encrypt]
      sslProfile: avi-ssl-profile
      destinationCA:  |-
        -----BEGIN CERTIFICATE-----
        [...]
        -----END CERTIFICATE-----

In following sections we will decipher this specifications one by one to understand how affects to the behaviour of the load balancer. As a very first step we will need a testbed application in a form of a secure ingress object. I will use this time the kuard application that is useful for testing and troubleshooting. You can find information about kuard here.

kubectl create deployment kuard --image=gcr.io/kuar-demo/kuard-amd64:1 --replicas=6

Now expose the application creating a ClusterIP service listening in port 80 and targeting the port 8080 that is the one used by kuard.

kubectl expose deployment kuard --port=80 --target-port=8080
service/kuard exposed

The secure ingress definition requires a secret resource in kubernetes. An easy way to generate the required cryptographic stuff is by using a simple i created and availabe here. Just copy the script, make it executable and launch it as shown below using your own data.

./create_secret.sh ghost /C=ES/ST=Madrid/CN=kuard.avi.iberia.local default

If all goes well you should have a new kubernetes secret tls object that you can verify by using kubectcl commands as shown below

kubectl describe secret kuard-secret
Name:         kuard-secret
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
tls.crt:  574 bytes
tls.key:  227 bytes

Create a secure ingress yaml definition including the certificate, name, ports and rest of relevant specifications.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kuard
  labels:
    app: kuard
    app: gslb
spec:
  tls:
  - hosts:
    - kuard.avi.iberia.local
    secretName: kuard-secret
  rules:
    - host: kuard.avi.iberia.local
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: kuard
              port:
                number: 80

If everything went well, you will see the beatiful graphical representation of the declared ingress state in the AVI Controller GUI

And we can check the app is up and running by loading the page at https://ghost.avi.iberia.local in my case.

Now we are ready to go so let’s starting playing with the CRD definitions.

healthMonitors

Health Monitoring is a key element in a Application Delivery Controller system because is a subsystem that takes care of the current status of the real servers that eventually will respond the client requests. The health monitor can operate at different levels, it could be just a simple L3 ICMP echo request to check if the backend server is alive or it could be a L4 TCP SYN to verify the server is listening in a specific TCP port or even it can be a L7 HTTP probe to check if the server is responding with certain especific body content or headers. Sometimes it might be interesting to add some extra verification to ensure our system is responding as expected or even controlling the health-status of a particular server on environmental variables such as Time of Day or Day of Week. The health-monitor can use python, perl or shell scripting to create very sophisticated health-monitors.

To test how it works I have created a very simple script that will parse the server response and will decide if the server is healthy. To do so I will send a curl and will try to match (grep) an specific string within the server response. If the script returns any data it is considered ALIVE whereas if there is no data returned the system will declare the server as DOWN. For this especific case, just as an example I will use the Health Monitor to exclude certain worker nodes in kubernetes in a rudimentary way based on the IP included in the response that the kuard application sent. In this case, I will consider that only the servers running at any IP starting with 10.34.3 will be considered ALIVE.

Navitage to Templates > Health Monitor > CREATE and create a new Health Monitor that we will call KUARD-SHELL

Remember all the DataPlane related tasks are performed from the Service Engines including the health-monitoring. So It’s always a good idea to verify manually from the Service Engine if the health-monitor is working as expected. Let’s identify the Service Engine that is realizing our Virtual Service

Log in into the AVI controller CLI and connect to the service engine using attach command

[admin:10-10-10-33]: > attach serviceengine s1_ako-se-kmafa

Discover the network namespace id that usually is avi_ns1

admin@s1-ako-se-vyrvl:~$ ip netns
avi_ns1 (id: 0)

Open a bash shell in the especific namespace. The admin password would be required

admin@s1-ako-se-vyrvl:~$ sudo ip netns exec avi_ns1 bash
[sudo] password for admin:
root@s1-ako-se-kmafa:/home/admin#

From this shell you can now mimic the health-monitoring probes to validate your actual server health manually and for script debugging. Get the IP address assigned to your pods using kubectl get pods and check the reachability and actual responses as seen by the Service Engines.

kubectl get pod -o custom-columns="NAME:metadata.name,IP:status.podIP" -l app=kuard
NAME                   IP
kuard-96d556c9-k2bfd   10.34.1.131
kuard-96d556c9-k6r99   10.34.3.204
kuard-96d556c9-nwxhm   10.34.3.205

In my case I have selected 10.34.3.206 that been assigned to one of the kuard application pods. Now curl to the application to see the actual server response as shown below:

curl -s http://10.34.3.204:8080
<!doctype html>

<html lang="en">
<head>
  <meta charset="utf-8">

  <title>KUAR Demo</title>

  <link rel="stylesheet" href="/static/css/bootstrap.min.css">
  <link rel="stylesheet" href="/static/css/styles.css">

  <script>
var pageContext = {"hostname":"kuard-96d556c9-g9sb4","addrs":["10.34.3.206"],"version":"v0.8.1-1","versionColor":"hsl(18,100%,50%)","requestDump":"GET / HTTP/1.1\r\nHost: 10.34.3.204:8080\r\nAccept: */*\r\nUser-Agent: curl/7.69.0","requestProto":"HTTP/1.1","requestAddr":"10.10.14.22:56830"}
  </script>
</head>

<... skipped> 
</html>

Using the returned BODY section you can now define your own health-monitor. In this example, we want to declare alive only to pods running in the worker node whose allocated podCIDR matches with 10.34.3.0/24. So an simple way to do it is by using grep and try to find a match with the “10.34.3” string.

root@s1-ako-se-kmafa:/home/admin# curl -s http://10.34.3.204:8080 | grep "10.34.3"
var pageContext = {"hostname":"kuard-96d556c9-g9sb4","addrs":["10.34.3.206"],"version":"v0.8.1-1","versionColor":"hsl(18,100%,50%)","requestDump":"GET / HTTP/1.1\r\nHost: 10.34.3.204:8080\r\nAccept: */*\r\nUser-Agent: curl/7.69.0","requestProto":"HTTP/1.1","requestAddr":"10.10.14.22:56968"}

You can also verify if this there is no answer for pods at any other podCIDR that does not start from 10.10.3. Take 10.34.1.130 as the pod IP and you should not see any output.

root@s1-ako-se-kmafa:/home/admin# curl -s http://10.34.1.131:8080 | grep "10.10.3"
<NO OUTPUT RECEIVED>

Now we have done some manual validation we are safe to go and using IP and PORT as input variables we can now formulate our simple custom-health monitor using the piece of code below.

#!/bin/bash
curl http://$IP:$PORT | grep "10.34.3"

Paste the above script in the Script Code section of our custom KUARD-SHELL Health-Monitor

And now push the configuration to the HTTPRule CRD adding above lines and pushing to Kubernetes API using kubectl apply as usual.

apiVersion: ako.vmware.com/v1alpha1
kind: HTTPRule
metadata:
   name: kuard
   namespace: default
spec:
  fqdn: kuard.avi.iberia.local
  paths:
  - target: /
    healthMonitors:
    - KUARD-SHELL

As a first step, verify in the Pool Server configuration how the new Health Monitor has been configured.

Navigate to Server Tab within our selected pool and you should see an screen like the shown below. According to our custom health-monitor only pods running at 10.34.3.X are declared as green whereas pods running in any other podCIDR will be shown as red (dead).

Now let’s can scale our replicaset to eight replicas to see if the behaviour is consistent.

kubectl scale deployment kuard --replicas=8
deployment.apps/kuard scaled

ahora se muestra y bla, bla

That example illustrate how you can attach a custom health-monitor to influence the method to verify of the backend servers using sophisticated scripting.

loadBalancerPolicy

The heart of a load balancer is its ability to effectively distribute traffic across the available healthy servers. AVI provides a number of algorithms, each with characteristics that may be best suited each different use case. Currently, the following values are supported for load balancer policy:

  • LB_ALGORITHM_CONSISTENT_HASH
  • LB_ALGORITHM_CORE_AFFINITY
  • LB_ALGORITHM_FASTEST_RESPONSE
  • LB_ALGORITHM_FEWEST_SERVERS
  • LB_ALGORITHM_FEWEST_TASKS
  • LB_ALGORITHM_LEAST_CONNECTIONS
  • LB_ALGORITHM_LEAST_LOAD
  • LB_ALGORITHM_NEAREST_SERVER
  • LB_ALGORITHM_RANDOM
  • LB_ALGORITHM_ROUND_ROBIN
  • LB_ALGORITHM_TOPOLOGY

A full description of existing load balancing algorithms and how they work is available here.

The default algorithm is the Least Connection who takes into account the number of existing connections in each of the servers to make a decision about the next request. To verify the operation of the current LB algorithm you can use a simple single line shell script and some text processing. This is an example for the kuard application but adapt it according to your needs and expected servers response.

while true; do echo "Response received from POD at " $(curl -k https://kuard.avi.iberia.local -s | grep "addrs" | awk -F ":" '/1/ {print $3}' | awk -F "," '/1/ {print $1}'); sleep 1; done
Response received from POD at  ["10.34.3.42"]
Response received from POD at  ["10.34.3.42"]
Response received from POD at  ["10.34.3.42"]
Response received from POD at  ["10.34.3.42"]
Response received from POD at  ["10.34.3.42"]
Response received from POD at  ["10.34.3.42"]

As you can see the response is been received always from the same server that is running, in this case, at 10.34.3.42. Now we will try to change it to LS_ALGORITHM_ROUND_ROBIN to see how it work

kubectl edit HTTPRule kuard

apiVersion: ako.vmware.com/v1alpha1
kind: HTTPRule
metadata:
  name: kuard
  namespace: default
spec:
  fqdn: kuard.avi.iberia.local
  paths:
  - target: / 
    healthMonitors:
    - KUARD-SHELL
    loadBalancerPolicy:
      algorithm: LB_ALGORITHM_ROUND_ROBIN

If you repeat the same test you can now see how the responses are now being distributed in a round robin fashion across all the existing backend servers (i.e pods).

while true; do echo "Response received from POD at " $(curl -k https://kuard.avi.iberia.local -s | grep addrs | awk -F ":" '/1/ {print $3}' | awk -F "," '/1/ {print $1}'); sleep 1; done
Response received from POD at  ["10.34.3.204"]
Response received from POD at  ["10.34.3.208"]
Response received from POD at  ["10.34.3.207"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.204"]
Response received from POD at  ["10.34.3.208"]
Response received from POD at  ["10.34.3.207"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.204"]
Response received from POD at  ["10.34.3.208"]
Response received from POD at  ["10.34.3.207"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.204"]
Response received from POD at  ["10.34.3.208"]
Response received from POD at  ["10.34.3.207"]

An easy way to verify the traffic distribution is using AVI Analytics. Click on Server IP Address and you should see how the client request are being distributed evenly across the available servers following the round-robin algorithm.

You can play with other available methods to select the best algorithm according to your needs.

applicationPersistence

HTTPRule CRD can also be used to express application persistence for our application. Session persistence ensures that, at least for the duration of the session or amount of time, the client will reconnect with the same server. This is especially important when servers maintain session information locally. There are diferent options to ensure the persistence. You can find a full description of available Server Persistence options in AVI here.

We will use the method based on HTTP Cookie to achieve the required persistence. With this persistence method, AVI Service Engines (SEs) will insert an HTTP cookie into a server’s first response to a client. Remember to use HTTP cookie persistence, no configuration changes are required on the back-end servers. HTTP persistence cookies created by AVI have no impact on existing server cookies or behavior.

Let’s create our own profile. Navigate to Templates > Profiles > Persistence > CREATE and define the COOKIE-PERSISTENCE-PROFILE. The cookie name is an arbitrary name. I will use here MIGALLETA as the cookie name as shown below:

Edit the HTTPRule to push the configuration to our Pool as shown below:

kubectl edit HTTPRule kuard

apiVersion: ako.vmware.com/v1alpha1
kind: HTTPRule
metadata:
  name: kuard
  namespace: default
spec:
  fqdn: kuard.avi.iberia.local
  paths:
  - target: / 
    healthMonitors:
    - KUARD-SHELL
    loadBalancerPolicy:
      algorithm: LB_ALGORITHM_ROUND_ROBIN
    applicationPersistence: COOKIE-PERSISTENCE-PROFILE

The AVI GUI shows how the new configuration has been succesfully applied to our Pool.

To verify how the cookie-based persistence works lets do some tests with curl. Although the browsers will use the received cookie for subsequent requests during session lifetime, the curl client implementation does not reuse this cookie received information. That means the Server Persistence will not work as expected unless you reuse the cookie received. In fact if you repeat the same test we used to verify the LoadBalancer algorithm you will see the same round robin in action.

while true; do echo "Response received from POD at " $(curl -k https://kuard.avi.iberia.local -s | grep addrs | awk -F ":" '/1/ {print $3}' | awk -F "," '/1/ {print $1}'); sleep 1; done
Response received from POD at  ["10.34.3.208"]
Response received from POD at  ["10.34.3.207"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.204"]
Response received from POD at  ["10.34.3.208"]
Response received from POD at  ["10.34.3.207"]

We need to save the received cookie and then reuse it during the session. To save the received cookies from the AVI LoadBalancer just use the following command that will write the cookies in the mycookie file

curl -k https://kuard.avi.iberia.local -c mycookie

As expected, the server has sent a cookie with the name MIGALLETA and some encrypted payload that contains the back-end server IP address and port. The payload is encrypted with AES-256. When a client makes a subsequent HTTP request, it includes the cookie, which the SE uses to ensure the client’s request is directed to the same server and theres no need to maintain in memory session tables in the Service Engines. To show the actual cookie just show the content of the mycookie file.

cat mycookie
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

kuard.avi.iberia.local  FALSE   /       TRUE    0       MIGALLETA       029390d4b1-d684-4e4e2X85YaqGAIGwwilIc5zjXcplMYncHJMGZRVobEXRvqRWOuM7paLX4al2rWwQ5IJB8

Now repeat the same loop but note that now the curl command has been modified to send the cookie contents with the –cookie option as shown below.

while true; do echo "Response received from POD at " $(curl -k https://kuard.avi.iberia.local --cookie MIGALLETA=029390d4b1-d684-4e4e2X85YaqGAIGwwilIc5zjXcplMYncHJMGZRVobEXRvqRWOuM7paLX4al2rWwQ5IJB8 -s | grep addrs | awk -F ":" '/1/ {print $3}' | awk -F "," '/1/ {print $1}'); sleep 1; done
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.205"]
Response received from POD at  ["10.34.3.205"]

The server persistence is now achieved. You can easily verify it using the AVI Analytics as shown below:

Just select a transaction. Note the Persistence Used is displayed as true and a Persistence Session ID has been assigned indicating this session will persist in the same backend server.

Now click on the View All Headers and you should be able to see the cookie received from the client and sent to the end server. The service engine decodes the payload content to persist the session with the original backend server.

tls

The tls setting is used to express the reencryption of the traffic between the Load Balancer and the backend servers. This can be used in a environments in which clear communications channels are not allowed to meet regulatory requirements such as PCI/DSS. To try this out, we will change the application and we will prepare an application that uses HTTPS as transport protocol in the ServiceEngine-to-pod segment.

We will create a custom docker image based on Apache httpd server and we will enable TLS and use our own certificates. As a first step is create the cryptographic stuff needed to enable HTTPS. Create a private key then a Certificate Signing Request and finally self-signed the request using the private key to produce a X509 public certificate. The steps are shown below:

# Generate Private Key and save in server.key file
openssl ecparam -name prime256v1 -genkey -noout -out server.key
# Generate a Cert Signing Request using a custom Subject and save into server.csr file
openssl req -new -key server.key -out server.csr -subj /C=ES/ST=Madrid/CN=server.internal.lab
# Self-Signed the CSR and create a X509 cert in server.crt
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

Now get the apache configuration file using the following command that runs a temporary docker image and execute a command to get the default httpd.conf and saves it to a local my-httpd.conf file.

docker run --rm httpd:2.4 cat /usr/local/apache2/conf/httpd.conf > my-httpd.conf

Edit my-httpd.conf and uncomment the /usr/local/apache2/conf/httpd.conf by removing the hash symbol at the beginning of the following lines:

...
LoadModule socache_shmcb_module modules/mod_socache_shmcb.so
...
LoadModule ssl_module modules/mod_ssl.so
...
Include conf/extra/httpd-ssl.conf
...

Create a simple Dockerfile to COPY the created certificates server.crt and server.key into /usr/local/apache2/conf/ as well as the custom config file with SSL enabling options.

FROM httpd:2.4
COPY ./my-httpd.conf /usr/local/apache2/conf/httpd.conf
COPY ./server.crt /usr/local/apache2/conf
COPY ./server.key /usr/local/apache2/conf

Build the new image. Use your own Docker Hub id and login first using docker login to interact with Docker hub using CLI. In this case my docker hub is jhasensio and bellow image is publicly available if you want to reuse it.

sudo docker build -t jhasensio/httpd:2.4 .
Sending build context to Docker daemon  27.14kB
Step 1/4 : FROM httpd:2.4
 ---> 39c2d1c93266
Step 2/4 : COPY ./my-httpd.conf /usr/local/apache2/conf/httpd.conf
 ---> fce9c451f72e
Step 3/4 : COPY ./server.crt /usr/local/apache2/conf
 ---> ee4f1a446b78
Step 4/4 : COPY ./server.key /usr/local/apache2/conf
 ---> 48e828f52951
Successfully built 48e828f52951
Successfully tagged jhasensio/httpd:2.4

Login into your docker account and Push to docker.

sudo docker push jhasensio/httpd:2.4
The push refers to repository [docker.io/jhasensio/httpd]
e9cb228edc5f: Pushed
9afaa685c230: Pushed
66eaaa491246: Pushed
98d580c48609: Mounted from library/httpd
33de34a890b7: Mounted from library/httpd
33c6c92714e0: Mounted from library/httpd
15fd28211cd0: Mounted from library/httpd
02c055ef67f5: Mounted from library/httpd
2.4: digest: sha256:230891f7c04854313e502e2a60467581569b597906318aa88b243b1373126b59 size: 1988

Now you can use the created image as part of you deployment. Create a deployment resource as usual using below yaml file. Note the Pod will be listening in port 443

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd
  labels:
    app: httpd
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: jhasensio/httpd:2.4
        ports:
        - containerPort: 443

Now create the service ClusterIP and expose it using an secure ingress object. An existing tls object called httpd-secret object must exist in kubernetes to get this configuration working. You can generate this secret object using a simple script available here.

apiVersion: v1
kind: Service
metadata:
  name: httpd
spec:
  ports:
   - name: https
     port: 443
     targetPort: 443
  type: ClusterIP
  selector:
    app: httpd
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpd
  labels:
    app: httpd
    app: gslb
spec:
  tls:
  - hosts:
    - httpd.avi.iberia.local
    secretName: httpd-secret
  rules:
    - host: httpd.avi.iberia.local
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: httpd
              port:
                number: 443

Verify the pod IP assignment using kubectl get pod and some filtering as shown below

kubectl get pod -o custom-columns="NAME:metadata.name,IP:status.podIP" -l app=httpd
NAME                     IP
httpd-5cffd677d7-clkmm   10.34.3.44
httpd-5cffd677d7-hr2q8   10.34.1.39
httpd-5cffd677d7-qtjcw   10.34.1.38

Create a new HTTPRule object in a yaml file and apply it using kubectl apply command. Note we have changed the application to test TLS reencryption so a new FQDN is needed to link the HTTPRule object with the new application. It’s a good idea to change the healthMonitor to System-HTTPS instead of the default System-HTTP. We can refer also to our own SSL Profile that will define the TLS negotiation and cypher suites.

apiVersion: ako.vmware.com/v1alpha1
kind: HTTPRule
metadata:
  name: httpd
  namespace: default
spec:
  fqdn: httpd.avi.iberia.local
  paths:
  - target: /
    tls:
      type: reencrypt
      sslProfile: CUSTOM_SSL_PROFILE
    healthMonitors:
    - System-HTTPS

Now we will verify if our httpd pods are actually using https to serve the content. A nice trick to troubleshoot inside the pod network is using a temporary pod with a prepared image that contains required network tools preinstalled. An example of this images is the the netshoot image available here. The following command creates a temporary pod and execute a bash session for troubleshooting purposes. The pod will be removed as soon as you exit from the ad-hoc created shell.

kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash

Now you can test the pod from inside the cluster to check if our SSL setup is actually working as expected. Using curl from the temporary shell try to connect to one of the allocated pod IPs.

bash-5.1# curl -k https://10.34.3.44 -v
*   Trying 10.34.3.44:443...
* Connected to 10.34.3.44 (10.34.3.44) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=ES; ST=Madrid; CN=server.internal.lab
*  start date: May 27 10:54:48 2021 GMT
*  expire date: May 27 10:54:48 2022 GMT
*  issuer: C=ES; ST=Madrid; CN=server.internal.lab
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
> GET / HTTP/1.1
> Host: 10.34.3.44
> User-Agent: curl/7.75.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Thu, 27 May 2021 12:25:02 GMT
< Server: Apache/2.4.48 (Unix) OpenSSL/1.1.1d
< Last-Modified: Mon, 11 Jun 2007 18:53:14 GMT
< ETag: "2d-432a5e4a73a80"
< Accept-Ranges: bytes
< Content-Length: 45
< Content-Type: text/html
<
<html><body><h1>It works!</h1></body></html>
* Connection #0 to host 10.34.3.44 left intact

Above you can verify how the server is listening on port 443 and the certificate information presented during TLS handshaking corresponds to our configuration. This time TLS1.3 has been used to establish the secure negotiation and AES_256_GCM_SHA384 cypher suite has used for encryption. Generate some traffic to the https://httpd.avi.iberia.local url and it should display the default apache webpage as displayed below:

Select one of the transactions.This time, according to the configured SSL custom profile the traffic is using TLS1.2 as shown below:

To check how our custom HTTPRule has changed the Pool configuration just navigate to Applications > Pool > Edit Pool: S1-AZ1–default-httpd.avi.iberia.local_-httpd. The Enable SSL and the selected SSL Profile has now been set to the new values as per the HTTPRule.

You can even specify a custom CA in case you are using CA issued certificates to validate backend server identity. We are not testing this because is pretty straightforward.

destinationCA:  |-
        -----BEGIN CERTIFICATE-----
        [...]
        -----END CERTIFICATE-----

That concludes this article. Hope you have found useful to influence how the AVI loadbalancer handle the pool configuration to fit your application needs. Now it’s time to explore in the next article how we can take control of some AVI Infrastructure parameters using a new CRD: AviInfraSettings.

AVI for K8s Part 8: Customizing L7 Virtual Services using HostRule CRDs

Till now we have been used standard API kubernetes resources such as deployments, replicasets, secrets, services, ingresses… etc to define all the required configurations for the integrated Load Balancing services that AVI Service Engines eventually provides. Very oftenly the native K8s API is not rich enough to have a corresponding object to configure advance configuration in the external integrated system (e.g. the external Load Balancer in our case) and this is when the Custom Resource Definition or CRD come into scene. A CRD is a common way to extend the K8s API with aditional custom schemas. The AKO operator supports some CRD objects to define extra configuracion that allows the end user to customize even more the service. Another common method to personalize the required configuration is through the use of annotations or even matchlabels, however, the usage of CRD is a best approach since, among other benefits, can be integrated with the RBAC native policies in k8s to add extra control and access to these new custom resources.

This guide is based on the testbed scenario represented in above figure and uses a single kubernetes cluster and a single AVI controller. Antrea is selected as CNI inside the Kubernetes cluster.

AKO uses two categories of CRD

  • Layer 7 CRDs.- provides customization for L7 Ingress resources
    • HostRule CRD.- provides extra settings to configure the Virtual Service
    • HTTPRule CRD.- provides extra settings to customize the Pool or backend associated objects
  • Infrastructure CRDs.- provides extra customization for Infrastructure
    • AviInfraSetting CRD.- Defines L4/L7 infrastructure related parameters such as Service Engine Groups, VIP Network… etc)

This article will cover in detail the first of them which is the HostRule CRD. The subsequent articles of this series go through HTTPRule and AviInfraSetting CRD.

Upgrading existing AKO Custom Resource Definitions

As mentioned in previous articles we leverage helm3 to install and manages AKO related packages. Note that when we perform a release upgrade, helm3 does not upgrade the CRDs. So, whenever you upgrade a release, run the following command to ensure you are getting the last version of CRD:

helm template ako/ako --version 1.4.2 --include-crds --output-dir $HOME
wrote /home/ubuntu/ako/crds/networking.x-k8s.io_gateways.yaml
wrote /home/ubuntu/ako/crds/ako.vmware.com_hostrules.yaml
wrote /home/ubuntu/ako/crds/ako.vmware.com_aviinfrasettings.yaml
wrote /home/ubuntu/ako/crds/ako.vmware.com_httprules.yaml
wrote /home/ubuntu/ako/crds/networking.x-k8s.io_gatewayclasses.yaml
wrote /home/ubuntu/ako/templates/serviceaccount.yaml
wrote /home/ubuntu/ako/templates/secret.yaml
wrote /home/ubuntu/ako/templates/configmap.yaml
wrote /home/ubuntu/ako/templates/clusterrole.yaml
wrote /home/ubuntu/ako/templates/clusterrolebinding.yaml
wrote /home/ubuntu/ako/templates/statefulset.yaml
wrote /home/ubuntu/ako/templates/tests/test-connection.yaml

Once you have downloaded, just apply them using kubectl apply command.

kubectl apply -f $HOME/ako/crds/
customresourcedefinition.apiextensions.k8s.io/aviinfrasettings.ako.vmware.com created
Warning: resource customresourcedefinitions/hostrules.ako.vmware.com is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/hostrules.ako.vmware.com configured
Warning: resource customresourcedefinitions/httprules.ako.vmware.com is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/httprules.ako.vmware.com configured
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.networking.x-k8s.io created
customresourcedefinition.apiextensions.k8s.io/gateways.networking.x-k8s.io created

Once upgraded, relaunch AKO using the values.yaml according to your setup. For our testbed scenario I will use a set of values you can find here.

Exploring the HostRule CRD

Let’s start with the HostRule CRD one that is used to provide extra configuracion for the Virtual Host properties. The virtual host is a logical construction for hosting multiple FQDNs on a single virtual service definition. This allows one VS to share some resources and properties among multiple Virtual Hosts. The CRD object as any other kubernetes resource is configured using declarative yaml files and it looks this:

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: my-host-rule
  namespace: red
spec:
  virtualhost:
    fqdn: foo.com # mandatory
    enableVirtualHost: true
    tls: # optional
      sslKeyCertificate:
        name: avi-ssl-key-cert
        type: ref
      sslProfile: avi-ssl-profile
      termination: edge
    httpPolicy: 
      policySets:
      - avi-secure-policy-ref
      overwrite: false
    datascripts:
    - avi-datascript-redirect-app1
    wafPolicy: avi-waf-policy
    applicationProfile: avi-app-ref
    analyticsProfile: avi-analytics-ref
    errorPageProfile: avi-errorpage-ref

Before going through the different settings to check how they affect to the Virtual Service configuration we need to create an application for testing. I will create a secure Ingress object to expose the a deployment that will run the Ghost application. Ghost is a quite popular app and one of the most versatile open source content management systems. First define the deployment, this time using imperative commands.

kubectl create deployment ghost --image=ghost --replicas=3
deployment.apps/ghost created

Now expose the application in port 2368 which is the port used by the ghost application.

kubectl expose deployment ghost --port=2368 --target-port=2368
service/ghost exposed

The secure ingress definition requires a secret resource in kubernetes. An easy way to generate the required cryptographic stuff is by using a simple script including the Openssl commands created and availabe here. Just copy the script, make it executable and launch it as shown below using your own data.

./create_secret.sh ghost /C=ES/ST=Madrid/CN=ghost.avi.iberia.local default

If all goes well you should have a new kubernetes secret tls object that you can verify by using kubectcl commands as shown below

kubectl describe secret ghost
Name:         ghost-secret
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
tls.crt:  570 bytes
tls.key:  227 bytes

Now we can specify the ingress yaml definition including the certificate, name, ports and other relevant attributes.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ghost
  labels:
    app: ghost
    app: gslb
spec:
  tls:
  - hosts:
    - ghost.avi.iberia.local
    secretName: ghost-secret
  rules:
    - host: ghost.avi.iberia.local
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: ghost
              port:
                number: 2368

In a few seconds after applying you should see the beatiful graphical representation of the declared ingress state in the AVI Controller GUI.

Virtual Service representation

And naturally, we can check the if the ghost application is up and running by loading the web interface at https://ghost.avi.iberia.local in my case.

Click on the lock Non secure icon in the address bar of the brower and show the certificate. Verify the ingress is using the secret we created that should corresponds to our kubernetes

Now let’s play with the CRD definitions.

enableVirtualHost

The first setting is very straightforward and basically is used as a flag to change the administrative status of the Virtual Service. This is a simple way to delegate the actual status of the ingress service to the kubernetes administrator. To create the HostRule you need to create a yaml file with the following content and apply it using kubectl apply command. The new HostRule object will be named ghost.

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    fqdn: ghost.avi.iberia.local
    enableVirtualHost: true

Once our HostRule resource is created you can explore the actual status by using regular kubectl command line as shown below

kubectl get HostRule ghost -o yaml
apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ako.vmware.com/v1alpha1","kind":"HostRule","metadata":{"annotations":{},"name":"ghost","namespace":"default"},"spec":{"virtualhost":{"enableVirtualHost":true,"fqdn":"ghost.avi.iberia.local"}}}
  creationTimestamp: "2021-05-19T17:51:09Z"
  generation: 1
  name: ghost
  namespace: default
  resourceVersion: "10590334"
  uid: 6dd06a15-33c2-4c9c-970e-ed5a21a81ce6
spec:
  virtualhost:
    enableVirtualHost: true
    fqdn: ghost.avi.iberia.local
status:
  status: Accepted

Now it’s time to toogle the enableVirtualHost key and set to false to see how it affect to our external Virtual Service in the AVI load balancer. The easiest way is using kubectl edit that will launch your preferred editor (typically vim) to change the definition on the fly.

kubectl edit HostRule ghost
apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ako.vmware.com/v1alpha1","kind":"HostRule","metadata":{"annotations":{},"name":"ghost","namespace":"default"},"spec":{"virtualhost":{"enableVirtualHost":true,"fqdn":"ghost.avi.iberia.local"}}}
  creationTimestamp: "2021-05-19T17:51:09Z"
  generation: 11
  name: ghost
  namespace: default
  resourceVersion: "12449394"
  uid: 6dd06a15-33c2-4c9c-970e-ed5a21a81ce6
spec:
  virtualhost:
    enableVirtualHost: false
    fqdn: ghost.avi.iberia.local
status:
  status: Accepted

Save the file using the classical <Esc>:wq! sequence if you are using vim editor and now you can check verify in th AVI GUI how this affect to the status of the Virtual Service.

w

If you click in the Virtual Service and then click on the pencil icon you can see the Enabled toogle is set to OFF as shown below:

sslKeyCertificate

AKO integration uses the cryptographic information stored in the standard secret kubernetes object and automatically pushes that information to the AVI controller using API calls according to the secure ingress specification. If you want to override this setting you can use the sslKeyCertificate key as part of the HostRule specification to provide alternative information that will be used for the associated ingress object. You can specify both the name of the certificate and also the sslProfile to influence the SSL negotiation parameters.

Till now, the AKO has been translating standard kubernetes objects such as ingress, secrets, deployments into AVI configuration items, in other words, AKO was automated all the required configuration in the AVI controller on our behalf. Generally speaking, when using CRD, the approach is slightly different. Now the AVI Load Balancer administrator must create in advance the required configuration objects to allow the kubernetes administrator to consume the defined policies and configurations as they are defined.

Let’s create this required configuration items from the AVI GUI. First we will check the available system certificates. Navigate to Templates > Security > SSL/TLS Certificates. We will use the System-Default-Cert-EC this time.

Similarly now navigate to Templates > Security > SSL/TLS Profile and create a new SSL Profile. Just for testing purposes select only insecure version such as SSL 3.0 and TLS 1.0 as the TLS version used during TLS handshake

Once the required configuration items are precreated in the AVI GUI you can reference them in the associated yaml file. Use kubectl apply -f to push the new configuration to the HostRule object.

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    fqdn: ghost.avi.iberia.local
    enableVirtualHost: true
    tls:
      sslKeyCertificate:
        name: System-Default-Cert-EC
        type: ref
      sslProfile: CUSTOM_SSL_PROFILE
      termination: edge

If you navigate to the ghost Virtual Service in the AVI GUI you can verify in the SSL Settings section how the new configuration has been successfully applied.

Additionally, if you use a browser and open the certificate you can see how the Virtual Service is using the System-Default-Cert-EC we have just configured by the HostRule CRD.

To verify the TLS handshaking according to the SSL Profile specification just use curl. Notice how the output shows TLS version 1.0 has been used to establish the secure channel.

curl -vi -k https://ghost.avi.iberia.local
*   Trying 10.10.15.162:443...
* TCP_NODELAY set
* Connected to ghost.avi.iberia.local (10.10.15.162) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.0 (IN), TLS handshake, Certificate (11):
* TLSv1.0 (IN), TLS handshake, Server key exchange (12):
* TLSv1.0 (IN), TLS handshake, Server finished (14):
* TLSv1.0 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.0 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.0 (OUT), TLS handshake, Finished (20):
* TLSv1.0 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.0 / ECDHE-ECDSA-AES256-SHA
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=System Default EC Cert
*  start date: Feb 23 19:02:20 2021 GMT
*  expire date: Feb 21 19:02:20 2031 GMT
*  issuer: CN=System Default EC Cert
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
> GET / HTTP/1.1
> Host: ghost.avi.iberia.local
> User-Agent: curl/7.67.0
> Accept: */*

httpPolicy

The AVI HTTP Policy is a feature that allow advanced customization of network layer security, HTTP security, HTTP requests, and HTTP responses. A policy may be used to control security, client request attributes, or server response attributes. Policies are comprised of matches and actions. If the match condition is satisfied then AVI performs the corresponding action.

A full description of the power of the httpPolicy is available in the AVI documentation page here.

The configuration of an httpPolicy is not as easy as the other AVI configuration elementes because part is the httppolicyset object is not shared neither explicitly populated in the AVI GUI which means you can only share policy sets and attach multiple policy sets to a VS through the CLI/API.

By default, AKO creates a HTTP Policy Set using the API when a new ingress object is created and it will be unique to the VS but, as mentioned, is not shown in the GUI as part of the VS definition.

Let’s try to define the http policy. Open a SSH connection to the AVI Controller IP address, log in and launch the AVI CLI Shell by issuing a shell command. The prompt will change indicating you are now accessing the full AVI CLI. Now configure the new httpPolicySet This time we will create a network policy to add traffic control for security purposes. Let define a rule with a MATCH statement that matches any request with a header equals to ghost.avi.iberia.local and if the condition is met the associated ACTION will be a rate-limit allowing only up to ten connections per second. For any traffic out of contract AVI will send a local response with a 429 code. To configure just paste the below command lines.

[admin:10-10-10-33]: configure httppolicyset MY_RATE_LIMIT
 http_security_policy
  rules
   name RATE_LIMIT_10CPS
   index 0
    match
     host_hdr 
      match_criteria HDR_EQUALS
      match_case insensitive 
      value ghost.avi.iberia.local
	  exit
	 exit
	action
	 action http_security_action_rate_limit
	 rate_profile
	  rate_limiter
	   count 10
	   period 1
	   burst_sz 0
	   exit
	  action
	   type rl_action_local_rsp
	   status_code http_local_response_status_code_429
	   exit
	  exit
	exit
   exit
  exit
 save

After saving a summary page is displayed indicating the resulting configuration


[admin:10-10-10-33]: httppolicyset>  save
+------------------------+----------------------------------------------------
| Field                  | Value                                              
+------------------------+----------------------------------------------------
| uuid                   | httppolicyset-6a33d859-d823-4748-9701-727fa99345b5 
| name                   | MY_RATE_LIMIT                                      
| http_security_policy   |                                                    
|   rules[1]             |                                                    
|     name               | RATE_LIMIT_10CPS                                   
|     index              | 0                                                  
|     enable             | True                                               
|     match              |                                                    
|       host_hdr         |                                                    
|         match_criteria | HDR_EQUALS                                         
|         match_case     | INSENSITIVE                                        
|         value[1]       | ghost.avi.iberia.local                             
|     action             |                                                    
|       action           | HTTP_SECURITY_ACTION_RATE_LIMIT                    
|       rate_profile     |                                                    
|         rate_limiter   |                                                    
|           count        | 10                                                 
|           period       | 1 sec                                              
|           burst_sz     | 0                                                  
|         action         |                                                    
|           type         | RL_ACTION_LOCAL_RSP                                
|           status_code  | HTTP_LOCAL_RESPONSE_STATUS_CODE_429                
| is_internal_policy     | False                                              
| tenant_ref             | admin                                              
+------------------------+----------------------------------------------------+

You can also interrogate the AVI API navigating to the following URL https://site1-avi.regiona.iberia.local/api/httppolicyset?name=MY_RATE_LIMIT. To make this work you need to open first a session to the AVI GUI in the same browser to authorize the API access requests.

Now that the httppolicyset configuration item is created you can simply attach to the Virtual Service using the HostRule object as previously explained. Edit the yaml file and apply the new configuration or edit inline using the kubect edit comand.

kubectl edit HostRule ghost
apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    enableVirtualHost: true
    fqdn: ghost.avi.iberia.local
    tls:
      sslKeyCertificate:
        name: System-Default-Cert-EC
        type: ref
      sslProfile: CUSTOM_SSL_PROFILE
      termination: edge
    httpPolicy: 
      policySets:
      - MY_RATE_LIMIT
      overwrite: false

As soon as you apply the new configure navigate to the Virtual Service configuration and click on the Policies tab. A dropdown menu appears now that shows the default Policy Set applied by AKO along with the new custom Policy set named MY_RATE_LIMIT.

AKO currently creates an httppolicyset that uses objects on the SNI child virtual services to route traffic based on host/path matches. These rules are always at a lower index than the httppolicyset objects specified in the CRD object. If you want to overwrite all httppolicyset objects on a SNI virtual service with the ones specified in the HostRule CRD, set the overwrite flag to True.

To check if our rate-limiting is actually working you can use the Apache Bench tool to inject some traffic into the virtual service. The below command will sent three million of request with a concurrency value set to 100.

ab -c 100 -n 300000 https://ghost.avi.iberia.local/

Try to access to the ghost application using your browser while the test is still progress and you are likely to receive a 429 Too Many Request error code indicating the rate limiting is working as expected.

You can also verify the rate-limiter in action using the AVI Anlytics. In this case the graph below clearly shows how AVI is sending a 4XX response for the 98,7% of most of the requests.

If you want to check even deeper click on any of the 429 response and you will verify how the RATE_LIMIT_10CPS rule is the responsible for this response sent to the client.

dataScript

DataScripts is the most powerful mechanism to add extra security and customization to our virtual services. They are comprised of any number of function or method calls which can be used to inspect and act on traffic flowing through a virtual service. DataScript’s functions are exposed via Lua libraries and grouped into modules: string, vs, http, pool, ssl and crypto. You can find dozens of samples at the AVI github site here.

For this example, we will use a simple script to provide message signature using a Hash-based Message Authenticaton code (HMAC) mechanism. This is a common method to add extra security for a RESTful API service by signing your message based on a shared secret between the client and the service. For the sake of simplicity we will use an easy script that will extract the host header of the server response and will generate a new header with the computed hashing of this value. We will use the SHA algorithm to calculate the hashing.

Again, in order to attach the Datascript to our application we need to precreate the corresponding configuration item in the AVI controller using any method (GUI, API, CLI). This time we will use the GUI. Navigate to Templates > Scripts > DataScripts > CREATE. Scroll down to the HTTP Response Event Script section and put above script that extract the Host header and then create a new http header named SHA1-hash with the computed SHA hashing applied to the host header value.

Select a proper name for this script since we need to reference it from the HostRule CRD object. I have named Compute_Host_HMAC. Now edit again the HostRule and apply the new configuration.

kubectl edit HostRule ghost

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    enableVirtualHost: true
    fqdn: ghost.avi.iberia.local
    tls:
      sslKeyCertificate:
        name: System-Default-Cert-EC
        type: ref
      sslProfile: CUSTOM_SSL_PROFILE
      termination: edge
    httpPolicy: 
      policySets:
      - MY_RATE_LIMIT
      overwrite: false
    datascripts:
    - Compute_Host_HMAC

Once the HostRule has been modified we can verify in the AVI GUI how the new DataScript has been applied. Edit the Virtual Service and go to Policies > DataScripts

To check if the datascript is working as expected browse to the ghost application to generate some traffic and open one of the transactions

Now click on View All Headers link at the right of the screen to inspect the Headers and you should see in the Headers sent to server section how a new custom header named SHA1-hash has been sent containing the computed value of the host header found in the request as expected according to the DataScript function.

wafPolicy

Web Application Firewall is an additional L7 security layer to protect applications from external attackers. AVI uses a very powerful software WAF solution and provides scalable security, threat detection, and application protection. WAF policy is based on a specific set of rules that protects the application. 

Apart from the classical technices that uses signature matching to check if the attacker is trying to exploit a known vulnerability by a known attack and technique, AVI also uses a sophisticated technology called iWAF that take advantage of Artificial Intelligence models with the goal of clasifying the good traffic from the potentially dangerous using unsupervised machine learning models. This modern method is very useful not only to alleviate the operative burden that tuning a WAF policy using legacy methods implies but also to mitigate false positives ocurrences.

As seen with previous examples we need to predefine a base WAF Policy in AVI to allow the kubernetes admins to consume or reference them by using the HostRule CRD corresponding specification. Let’s create the WAF Policy. Navigate now to Templates > WAF > WAF Profile > CREATE

Here we can assign GHOST-WAF-PROFILE as the WAF Profile name and other general settings for our polity such as HTTP versions, Methods, Content-Types, Extensions… etc. I will using default settings so far.

Now we can create our WAF Policy from Templates > WAF > WAF Policy > CREATE. Again we will use default settings and we keep the Detection Mode (just flag the suspicious requests) instead of Enforcement (sent a 403 response code and deny access). We will use GHOST-WAF-POLICY as the WAF Policy name and it will be referenced in the HostRule definition.

Now that all the preconfiguration tasks has been completed we are ready to attach the WAF policy by using the HostRule CRD corresponding setting. Edit the existing HostRule object and modify accordingly as shown below:

kubectl edit HostRule ghost

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    enableVirtualHost: true
    fqdn: ghost.avi.iberia.local
    tls:
      sslKeyCertificate:
        name: System-Default-Cert-EC
        type: ref
      sslProfile: CUSTOM_SSL_PROFILE
      termination: edge
    httpPolicy: 
      policySets:
      - MY_RATE_LIMIT
      overwrite: false
    datascripts:
    - Compute_Host_HMAC
    wafPolicy: GHOST-WAF-POLICY

As soon as the new settings are applied a new shield icon appears next to the green rounded symbol that represents the Virtual Service object in the AVI GUI. The shield confirms that a WAF policy has been attached to this particular VS.

If you navigate to the Virtual Service properties you can verify the WAF Policy has been configured as expected.

Just to play a little bit with the Positive Learning Module of the AVI iWAF feature, click on the Learning tab and drag the button to enable the WAF Learning. A warning indicating a Learning Group is required appears as shown below.

Click on Create Learning Group to create this configuration that allows the system to create the baseline of the known-good behaviour according to the learnt traffic.

Assign a new name such as Ghost-PSM-Group and enable the Learning Group checkbox. Left the rest settings as default.

Return to the previous screen and set some settings that allows to speed up the learning process as shown below:

The parameters we are tuning above are specifically

  • Sampling = 100 % .- All requests are analyzed and used to the train the ML model
  • Update_interval = 1 .- Time for the Service Engine to collect data before sending to the learning module
  • min_hits_to_learn = 10 .- Specify the minimun number of ocurrences to consider a relevant hit. A lower value allow learning to happen faster. Default is 10k.

WAF Policy is ready to learn and will autopromote rules according to observed traffic. In a production environment it can take some time to have enough samples to consider a good policy. To produce some traffic and get our applications quite ready before goint to production stage it’s recomended to perform an active scanning over our web application. We will use here one of the most popular scanner which is OWASP Zed Attack Proxy (a.k.a ZAP Proxy). You can find more information of this tool at the zap official site here. Using command line as shown below perform a quick scanning over our application.

/usr/share/zaproxy/zap.sh -cmd -quickurl https://ghost.avi.iberia.local -quickprogress
Found Java version 11.0.11-ea
Available memory: 7966 MB
Using JVM args: -Xmx1991m
Ignoring legacy log4j.properties file, backup already exists.
Accessing URL
Using traditional spider
Active scanning
[===============     ] 79% /

After some scanning we can see explore the discovered locations (paths) that our web application uses. These discovered locations will be used to understand how a legitimate traffic should behave and will be the input for our AI based classifier. By using the right amount of data this would help the system to gain accurary to clasiffy known-good behaviour from anomalies and take action without any manual tweaking of the WAF Policies.

Additionally, the ZAP active scanner attempts to find potential vulnerabilities by using known attacks against the specified target. From the analytics page you can now see the WAF Tags that are associated to the scanner activities and are used to classify the different attack techniques observed.

You can also see how the active scanner attempts matches with specific WAF signatures.

And if you want to go deeper you can pick up one of the flagged requests to see the specific matching signature.

And also you can create exceptions for instance in case of false positives if needed.

Note how a new WAF tab is now available as part of the embedded analytics. If you click on it you can see some insights related to the WAF attack historical trends as well as more specific details.

Lastly, enable the enforcement mode in our WAF Policy.

Open one of the FLAGGED requests detected during the active scanning while in Detection mode and replicate the same request from the browser. In this case I have chosen one of the observed XSS attack attempts using the above URL. If you try to navigate to the target, the WAF engine now will block the access and will generate a 403 Forbidden response back to the client as shown below

applicationProfile

The application profiles are used to change the behavior of virtual services, based on application type. By default the system uses the System-Secure-HTTP with common settings including SSL Everywhere feature set that ensures you use the best-in-class security methods for HTTPS such includin HSTS, Securing Cookies, X-Forwarded-Proto among other.

To test how we can use the application profiles from the HostRule CRD object, preconfigure a new Application Profile that we will call GHOST-HTTPS-APP-PROFILE. As an example I am tuning here the compression setting and checking the Remove Accept Encoding Header for the traffic sent to the backend server. This is a method for offloading the content compression to the AVI Load Balancer in order to alleviate the processing at the end server.

Push the configuration to the Virtual Service by adding the relevant information to our HostRule CRD using kubectl edit command as shown:

kubectl edit HostRule ghost

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    enableVirtualHost: true
    fqdn: ghost.avi.iberia.local
    tls:
      sslKeyCertificate:
        name: System-Default-Cert-EC
        type: ref
      sslProfile: CUSTOM_SSL_PROFILE
      termination: edge
    httpPolicy: 
      policySets:
      - MY_RATE_LIMIT
      overwrite: false
    datascripts:
    - Compute_Host_HMAC
    wafPolicy: GHOST-WAF-POLICY
    applicationProfile: GHOST-HTTPS-APP-PROFILE

As soon as the new configuration is pushed to the CRD, AKO will patch the VS with the new Application Profile setting as you can verify in the GUI.

Generate some traffic, select a recent transaction and show All Headers. Now you can see how the compression related settings specified in the Accept-Encoding header received from client are now suppresed and rewritten by a identity value meaning no encoding.

analyticsProfile

Since each application is different, it may be necessary to modify the analytics profile to set the threshold for satisfactory client experience or omit certain errors or to configure an external collector to send the analytic to. This specific setting can be attached to any of the applications deployed from kubernetes. As in previous examples, we need to preconfigure the relevant items to be able to reference them from the CRD. In that case we can create a custom analyticsProfile that we will call GHOST-ANALYTICS-PROFILE navigating to Templates > Profiles > Analytics. Now define an external source to send our logs via syslog.

As usual, edit the custom ghost HostRule object and add the corresponding lines.

kubectl edit HostRule ghost

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    enableVirtualHost: true
    fqdn: ghost.avi.iberia.local
    tls:
      sslKeyCertificate:
        name: System-Default-Cert-EC
        type: ref
      sslProfile: CUSTOM_SSL_PROFILE
      termination: edge
    httpPolicy: 
      policySets:
      - MY_RATE_LIMIT
      overwrite: false
    datascripts:
    - Compute_Host_HMAC
    wafPolicy: GHOST-WAF-POLICY
    applicationProfile: GHOST-HTTPS-APP-PROFILE
    analyticsProfile: GHOST-ANALYTICS-PROFILE

Once done, the Virtual Service will populate the Analytic Profile setting as per our HostRule specification as shown below:

If you have access to the syslog server you can see how AVI is now streaming via Syslog the transactions. Notice the amount of metrics AVI analytics produces as seen below. You can use a BI tool of your choice to further processing and dashboarding with great granularity.

May 26 17:40:01 AVI-CONTROLLER S1-AZ1--ghost.avi.iberia.local[0]: {"adf":false,"significant":0,"udf":false,"virtualservice":"virtualservice-cd3bab2c-1091-4d31-956d-ef0aee80bbc6","report_timestamp":"2021-05-26T17:40:01.744394Z","service_engine":"s1-ako-se-kmafa","vcpu_id":0,"log_id":5082,"client_ip":"10.10.15.128","client_src_port":45254,"client_dest_port":443,"client_rtt":1,"ssl_version":"TLSv1.2","ssl_cipher":"ECDHE-ECDSA-AES256-GCM-SHA384","sni_hostname":"ghost.avi.iberia.local","http_version":"1.0","method":"HEAD","uri_path":"/","user_agent":"avi/1.0","host":"ghost.avi.iberia.local","etag":"W/\"5c4a-r+7DuBoSJ6ifz7nS1cluKPsY5VI\"","persistent_session_id":3472328296598305370,"response_content_type":"text/html; charset=utf-8","request_length":83,"cacheable":true,"http_security_policy_rule_name":"RATE_LIMIT_10CPS","http_request_policy_rule_name":"S1-AZ1--default-ghost.avi.iberia","pool":"pool-6bf69a45-7f07-4dce-8e4e-7081136b31bb","pool_name":"S1-AZ1--default-ghost.avi.iberia.local_-ghost","server_ip":"10.34.3.3","server_name":"10.34.3.3","server_conn_src_ip":"10.10.14.20","server_dest_port":2368,"server_src_port":38549,"server_rtt":2,"server_response_length":288,"server_response_code":200,"server_response_time_first_byte":52,"server_response_time_last_byte":52,"response_length":1331,"response_code":200,"response_time_first_byte":52,"response_time_last_byte":52,"compression_percentage":0,"compression":"NO_COMPRESSION_CAN_BE_COMPRESSED","client_insights":"","request_headers":65,"response_headers":2060,"request_state":"AVI_HTTP_REQUEST_STATE_SEND_RESPONSE_HEADER_TO_CLIENT","all_request_headers":{"User-Agent":"avi/1.0","Host":"ghost.avi.iberia.local","Accept":"*/*"},"all_response_headers":{"Content-Type":"text/html; charset=utf-8","Content-Length":23626,"Connection":"close","X-Powered-By":"Express","Cache-Control":"public, max-age=0","ETag":"W/\"5c4a-r+7DuBoSJ6ifz7nS1cluKPsY5VI\"","Vary":"Accept-Encoding","Date":"Wed, 26 May 2021 17:40:02 GMT","Strict-Transport-Security":"max-age=31536000; includeSubDomains"},"headers_sent_to_server":{"X-Forwarded-For":"10.10.15.128","Host":"ghost.avi.iberia.local","Connection":"keep-alive","User-Agent":"avi/1.0","Accept":"*/*","X-Forwarded-Proto":"https","SHA1-hash":"8d8bc49ef49ac3b70a059c85b928953690700a6a"},"headers_received_from_server":{"X-Powered-By":"Express","Cache-Control":"public, max-age=0","Content-Type":"text/html; charset=utf-8","Content-Length":23626,"ETag":"W/\"5c4a-r+7DuBoSJ6ifz7nS1cluKPsY5VI\"","Vary":"Accept-Encoding","Date":"Wed, 26 May 2021 17:40:02 GMT","Connection":"keep-alive","Keep-Alive":"timeout=5"},"server_connection_reused":true,"vs_ip":"10.10.15.162","waf_log":{"status":"PASSED","latency_request_header_phase":210,"latency_request_body_phase":477,"latency_response_header_phase":19,"latency_response_body_phase":0,"rules_configured":true,"psm_configured":true,"application_rules_configured":false,"allowlist_configured":false,"allowlist_processed":false,"rules_processed":true,"psm_processed":true,"application_rules_processed":false},"request_id":"Lv-scVJ-2Ivw","servers_tried":1,"jwt_log":{"is_jwt_verified":false}}

Use your favourite log collector tool to extract the different fields contained in the syslog file and you can get nice graphics very easily. As an example, using vRealize Log Insights you can see the syslog events sent by AVI via syslog over the time.

This other example shows the average backend server RTT over time grouped by server IP (i.e POD).

Or even this one that shows the percentage of requests accross the pods.

errorPageProfile

The last configurable parameter so far is the errorPageProfile which can be use to produce custom error page responses adding relevant information that might be used to trace issues or simply to provide cool error page to your end users. As with previous settings, the first step is to preconfigure the custom error Page Profile using the GUI. Navitate to Templates > Error Page and create a new profile that we will call GHOST-ERROR-PAGE.

We will create a custom page to warn the users they are trying to access the website using a forbidden method. When this happens a 403 Code is generated and a customized web page can be returned to the user. I have used one cool page that displays the forbidden city. The HTML code is available here.

Once the Error Page profile has been created now is time to reference to customize our application using the HostRule CRD as shown below

kubectl edit HostRule ghost

apiVersion: ako.vmware.com/v1alpha1
kind: HostRule
metadata:
  name: ghost
  namespace: default
spec:
  virtualhost:
    enableVirtualHost: true
    fqdn: ghost.avi.iberia.local
    tls:
      sslKeyCertificate:
        name: System-Default-Cert-EC
        type: ref
      sslProfile: CUSTOM_SSL_PROFILE
      termination: edge
    httpPolicy: 
      policySets:
      - MY_RATE_LIMIT
      overwrite: false
    datascripts:
    - Compute_Host_HMAC
    wafPolicy: GHOST-WAF-POLICY
    applicationProfile: GHOST-HTTPS-APP-PROFILE
    analyticsProfile: GHOST-ANALYTICS-PROFILE
    errorPageProfile: GHOST-ERROR-PAGE

Verify the new configuration has been successfuly applied to our Virtual Service.

Now repeat the XSS attack attempt as shown in the wafPolicy section above and you can see how a beatufil custom message appears instead of the bored static 403 Forbidden shown before.

This finish this article. The next one will cover the customization of the backend/pool associated with our application and how you can also influence in the LoadBalancing algorithm, persistence, reencryption and other funny stuff.

AVI for K8s Part 7: Adding GSLB leader-follower hierarchy for extra availability

Now is time to make our architecture even more robust by leveraging the GSLB capabilities of AVI. We will create distributed DNS model in which the GSLB objects are distributed and synced across the different sites. The neutral DNS site will remain as the Leader but we will use the other AVI Controllers already in place to help service DNS requests as well as to provide extra availability to the whole architecture.

GSLB Hierarchy with a Leader and two Active Followers

Define GSLB Followers

To allow the other AVI Controllers to take part in DNS resolution (in terms of GSLB feature to turn the AVI Controllers into Active members), a DNS Virtual service must be also defined in the same way. Remember in AVI Controller at Site1 we defined two separate VRFs to acomodate the two different k8s clusters. For consistency we used also an independent VRF for Site2 even when it was not a requirement since we were handling a single k8s cluster. Do not fall into the temptation to reuse one of the existing VRFs to create the DNS Virtual Service we will use for GSLB Services!! Remember AMKO will create automatically Health-Monitors to verify the status of the Virtual Services so it can ensure reachability before answering DNS queries. If we place the Virtual Service in one of the VRFs, the Health-Monitor would not be able to reach the services in the other VRF because they have isolated routing tables. When a VRF has been used by the AKO integration the system will not allow you to define static IP routes in that VRF to implement a kind of route-leaking to reach other VRFs. This constraint would cause the GSLB related Health-Monitors in that VRF to be unable to reach services external to the VRF, therefore any service outside will be declared as DOWN. The solution is to place the DNS VS in the global VRF and define a default gateway as per your particular network topology.

Define a new network of your choice for IPAM and DNS VS and SE placement. In my case I have selected 10.10.22.0/24.

Repeat the same process for the Site2. In this case I will use the network 10.10.21.0/24. The resulting DNS VS configuration is shown below

Last step is the IP routing configuration to allow Health-Monitors to reach the target VS they need to monitor. Till now we haven’t defined IP routing information for the Virtual Services. The Virtual Service just return the traffic for the incoming requests to the same L2 MAC Address it found as Source MAC in the ingressing packet. This ensures that the traffic will return using the same path without the need for an IP routing lookup to determine the next-hop to reach the originating client IP Address. Now that we are implementing a Health-Monitor mechanism we need to configure where to send traffic towards the monitoring VS that are placed outside the local network to allow the health-monitor to suceed in its role. In the diagram above, the Health-monitor will use the default-gateway 10.10.22.1 to send leaving traffic directed to other networks.

AVI Health-Monitor Leaving traffic using IP Default Gateway

For the returning traffic, the Virtual Service just sends the traffic to the L2 observed as source in the incoming request to save IP routing lookup. There is no need to define the default gateway in the Service Engine to ensure the traffic returns using the same path.

VS Response traffic to AVI Health-Monitor using L2 information observed in incoming request

To complete the configuration go to AVI Controller at Site1 and define a Default Gateway for the Global VRF. Use 10.10.22.1 as default gateway in this case. From Infrastructure > Routing > Static Route and create a new static routing.

Default gateway for GSLB Health-Monitoring at Site1

Repeat the process for AVI Controller at Site2 and define a Default Gateway for the Global VRF. In this case the default gateway for the selected 10.10.21.0/24 network is 10.10.21.1.

Default gateway for GSLB Health-Monitoring at Site2

As these two services will act as followers there is no need to define anything else because the rest of the configuration and the GSLB objects will be pushed from the GSLB Leader as part of the syncing process.

Move now to the GUI at the GSLB Leader to add the two sites. Go to Infrastructure > GSLB and Add New Site. Populate the fields as shown below

Click Save and Set DNS Virtual Services

Repeat the same process for the Site2 GSLB and once completed the GSLB Configuration should display the status of the three sites. The table indicates the role, the IP address, the DNS VSes we are using for syncing the GSLB objects and the current status.

GSLB Active Members Status

Now move to one of the follower sites and verify if the GSLB has been actually synced. Go to Applications > GSLB Services and you should be able to see any of the GSLB services that were created from AMKO in the GSLB Leader site.

GSLB object syncing at Follower site

If you click on the object you should get the following green icons indicating the health-monitors created by AMKO are now reaching the monitored Virtual Services.

GSLB Service green status

For your information, if you had placed the Follower DNS VS in one of the existing VRFs you would get the following result. In the depicted case some of the monitors would be failing and would be marked in red color. Only the local VS will be declared as UP (green) whilst any VS outside DNS VRF will be declared as DOWN (red) due to the network connectivity issues. As you can notice

  • Health-Monitor at GSLB site perceives the three VS as up. The DNS VS has been placed in the default VRF so there are no constraints.
  • Health-Monitor at GSLB-Site1 site perceives only the local VRF Virtual Services as up and the external-vrf VS as declared as down.
  • Similarly the Health-Monitor at GSLB-Site2 site perceives only its local VS as up the other two external VSs are not seen so they are declared as down

Having completed the setup whenever a new Ingress or LoadBalancer service is created with the appropiate label or namespace used as selector in any of the three cluster under AMKO scope, an associated GSLB service will be created by AMKO automatically in the GSLB Leader site and subsequently the AVI GSLB subsystem will be in charge of replicating this new GSLB services to other GSLB Followers to create this nice distributed system.

Configuring Zone Delegation

However, remember that we have configured the DNS to forward the queries directed to our delegated zone avi.iberia.local towards a NS that pointed only to the DNS Virtual Service at the GSLB leader site. Obviously we would need to change the current local DNS configuration to include the new DNS at Follower sites as part of the Zone Delegation.

First of all configure the DNS Service at follower sites to be authoritative for the domain avi.iberia.local to have a consistent configuration across the three DNS sites.

SOA Configuration for DNS Virtual Service

Set also the behaviour for Invalid DNS Query processing to send a NXDOMAIN for invalid queries.

Create an A record pointing to the follower DNS sites IP addresses.

g-dns-site1 A Record pointing to the IP Address assigned to the Virtual Service

Repeat the same process for DNS at Site2

g-dns-site2 A Record pointing to the IP Address assigned to the Virtual Service

Now click on the properties for the delegated zone

Windows Server Zone Delegation Properties

Now click Add to configure the subsequent NS entries for the our Zone Delegation setup.

Adding follower GSLB sites for Zone Delegation

Repeat the same for g-dns-site2.iberia.local virtual service and you will get this configuration

Zone Delegation with three alternative NameServersn for avi.iberia.local

The delegated zone should display this ordered list of NS that will be used sequencially to forward the FQDN queries for the domain avi.iberia.local

In my test the MS DNS apparently uses the NS record as they appear in the list to forward queries. In theory, the algorithm used to distribute traffic among the different NameServer entries should be Round-Robin.

# First query is sent to 10.10.24.186 (DNS VS IP @ GSLB Leader Site)
21/12/2020 9:44:39 1318 PACKET  000002DFAC51A530 UDP Rcv 192.168.170.10  12ae   Q [2001   D   NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:44:39 1318 PACKET  000002DFAC1CB560 UDP Snd 10.10.24.186    83f8   Q [0000       NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:44:39 1318 PACKET  000002DFAAF7A9D0 UDP Rcv 10.10.24.186    83f8 R Q [0084 A     NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:44:39 1318 PACKET  000002DFAC51A530 UDP Snd 192.168.170.10  12ae R Q [8081   DR  NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)

# Subsequent queries uses the same IP for forwarding 
21/12/2020 9:44:51 1318 PACKET  000002DFAAF7A9D0 UDP Rcv 192.168.170.10  c742   Q [2001   D   NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:44:51 1318 PACKET  000002DFAC653CC0 UDP Snd 10.10.24.186    c342   Q [0000       NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:44:51 1318 PACKET  000002DFAD114950 UDP Rcv 10.10.24.186    c342 R Q [0084 A     NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:44:51 1318 PACKET  000002DFAAF7A9D0 UDP Snd 192.168.170.10  c742 R Q [8081   DR  NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)

Disable the DNS Virtual Service at the GSLB Leader site by clicking in the Enabled slider button on the Edit Virtual Service: g-dns window as shown below

Disabling g-dns DNS Service at GSLB Leader Site

Only after disabling the service does the local DNS try to use the second NameServer as specified in the configuration of the DNS Zone Delegation

# Query is now sent to 10.10.24.41 (DNS VS IP @ GSLB Follower Site1)
21/12/2020 9:48:56 1318 PACKET  000002DFACF571C0 UDP Rcv 192.168.170.10  4abc   Q [2001   D   NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:48:56 1318 PACKET  000002DFAB203990 UDP Snd 10.10.22.40     2899   Q [0000       NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:48:56 1318 PACKET  000002DFAAC730C0 UDP Rcv 10.10.22.40     2899 R Q [0084 A     NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:48:56 1318 PACKET  000002DFACF571C0 UDP Snd 192.168.170.10  4abc R Q [8081   DR  NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)

Similarly do the same at site1 disabling the g-dns-site1 DNS Virtual Service

Disabling g-dns-site1 DNS Service at GSLB Follower at Site1

Note how the DNS is forwarding the queries to the IP Address of the DNS at site 2 (10.10.21.50) as shown below

21/12/2020 9:51:09 131C PACKET  000002DFAC927220 UDP Rcv 192.168.170.10  f6b1   Q [2001   D   NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
# DNS tries to forward again to the DNS VS IP Address of the Leader
21/12/2020 9:51:09 131C PACKET  000002DFAD48F890 UDP Snd 10.10.24.186    3304   Q [0000       NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
# After timeout it fallbacks to DNS VS IP address of Site2
21/12/2020 9:51:13 0BB8 PACKET  000002DFAD48F890 UDP Snd 10.10.21.50     3304   Q [0000       NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:51:13 131C PACKET  000002DFAAF17920 UDP Rcv 10.10.21.50     3304 R Q [0084 A     NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)
21/12/2020 9:51:13 131C PACKET  000002DFAC927220 UDP Snd 192.168.170.10  f6b1 R Q [8081   DR  NOERROR] A      (5)hello(3)avi(6)iberia(5)local(0)

Datacenter blackout simulation analysis

Test 1: GSLB Leader Blackout

To verify the robustness of the architecture let’s simulate the blackout of each of the Availability Zones / DataCenter to see how the system reacts. We will now configure AMKO to split traffic evenly accross Datacenters. Edit the global-gdp object using Octant or kubectl edit command.

kubectl edit globaldeploymentpolicies.amko.vmware.com global-gdp -n avi-system

# Locate the trafficSplit section
trafficSplit:
  - cluster: s1az1
    weight: 5
  - cluster: s1az2
    weight: 5
  - cluster: s2
    weight: 5

Remember to change the default TTL from 30 to 2 seconds to speed up the test process

while true; do curl -m 2 http://hello.avi.iberia.local -s | grep MESSAGE; sleep 2; done
# The traffic is evenly distributed accross the three k8s clusters
  MESSAGE: This service resides in SITE2
  MESSAGE: This Service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2

With this baseline now we will simulate a blackout condition in the first GSLB site as depicted in below picture:

To simulate the blackout just disconnect the AVI Controller vNICs at Site1 as well as the vNICs of the Service Engines from vCenter…

If you go to Infrastructure > GSLB, after five minutes the original GSLB Leader site appears as down with the red icon.

Also the GSLB Service hello.avi.iberia.local appear as down from the GSLB Leader site perspective as you can tell bellow.

The DataPlane yet has not been affected because the Local DNS is using the remaining NameServer entries that points to the DNS VS at Site1 and Site2 so the FQDN resolution is neither affected at all.

  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2

Let’s create a new gslb ingress service from any of the clusters to see how this is affecting to AMKO which is in charge of sending instrucctions to the GSLB site to create the GSLB Services. I will use a new yaml file that creates the hackazon application. You can find the sample yaml file here.

kubectl apply -f hackazon_secure_ingress_gslb.yaml
deployment.apps/hackazon created
service/hackazon created
ingress.networking.k8s.io/hackazon created

The AMKO captures this event and tries to call the API of the GSLB Leader but it is failing as you can see bellow:

kubectl logs -f amko-0 -n avi-system

E1221 18:05:50.919312       1 avisession.go:704] Failed to invoke API. Error: Post "https://10.10.20.42//api/healthmonitor": dial tcp 10.10.20.42:443: connect: no route to host

The existing GSLB objects will remain working even when the leader is not available but the AMKO operation has been disrupted. The only way to restore the full operation is by promoting one of the follower sites to Leader. The procedure is well documented here. You would need also to change the AMKO integration settings to point to the new Leader instead of the old one.

If you restore now the connectivity to the affected site by connecting again the vNICs of both the AVI controller and the Service Engines located at GSLB Leader site, after some seconds you will see how the hackazon service is now created

You can test the hackazon application to verify not only the DNS resolution but also the datapath. Point your browser to http://hackazon.avi.iberia.local and you would get the hackazon page.

Conclusion: the GSLB Leader Site is in charge of AMKO objects realization. If we lose connectivity of this site GSLB operation will be disrupted and no more GSLB objects will be created. DataPath connectivy is not affected providing proper DNS Zone Delegation is configured at the DNS for the delegated zone. AMKO will reattempt the syncronization with the AVI controller till the Site is available. You can manually promote one of the Follower sites to Leader in order to restore full AMKO operation.

Test 2: Site2 (AKO only) Site Blackout

Now we will simulate a blackout condition in the Site2 GSLB site as depicted in below picture:

As you can imagine this condition stop connectivity to the Virtual Services at the Site2. But we need to ensure we are not sending incoming request towards this site that is now down, otherwise it might become a blackhole. The GSLB should be smart enough to detect the lost of connectivity condition and should react accordingly.

After some seconds the health-monitors declares the Virtual Services at Site2 as dead and this is reflected also in the status of the GSLB pool member for that particular site.

After some minutes the GSLB service at site 2 is also declared as down so the syncing is stopped.

The speed of the recovery of the DataPath is tied to the timers associated to health-monitors for the GSLB services that AMKO created automatically. You can explore the specific settings used by AMKO to create the Health-Monitor object by clicking the pencil next to the Health-Monitor definition in the GSLB Service you will get the following window setting

As you can see by default the Health-Monitor sends a health-check every 10 seconds. It wait up to 4 seconds to declare timeout and it waits up to 3 Failed Checks to declare the service as Down. It could take up some seconds to the full system to converge to cope with the failed site state. I have changed slightly the loop to test the traffic to do both dig resolution and a curl for getting the http server content.

while true; do dig hello.avi.iberia.local +noall +answer; curl -m 2 http://hello.avi.iberia.local -s | grep MESSAGE; sleep 1; done

# Normal behavior: Traffic is evenly distributed among the three clusters
hello.avi.iberia.local. 1       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1
hello.avi.iberia.local. 0       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1
hello.avi.iberia.local. 2       IN      A       10.10.23.40
  MESSAGE: This service resides in SITE2
hello.avi.iberia.local. 1       IN      A       10.10.23.40
  MESSAGE: This service resides in SITE2
hello.avi.iberia.local. 0       IN      A       10.10.23.40
  MESSAGE: This service resides in SITE2
hello.avi.iberia.local. 2       IN      A       10.10.26.40
  MESSAGE: This service resides in SITE1 AZ2
hello.avi.iberia.local. 0       IN      A       10.10.26.40
  MESSAGE: This service resides in SITE1 AZ2

# Blackout condition created for site 2
hello.avi.iberia.local. 0       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1
hello.avi.iberia.local. 0       IN      A       10.10.25.46

# DNS resolves to VS at Site 2 but no http answer is received
hello.avi.iberia.local. 0       IN      A       10.10.23.40
hello.avi.iberia.local. 1       IN      A       10.10.26.40
  MESSAGE: This service resides in SITE1 AZ2
hello.avi.iberia.local. 0       IN      A       10.10.26.40
  MESSAGE: This service resides in SITE1 AZ2
hello.avi.iberia.local. 2       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1
hello.avi.iberia.local. 1       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1
hello.avi.iberia.local. 0       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1

# Again two more times in a row DNS resolves to VS at Site 2 but no http answer again
hello.avi.iberia.local. 2       IN      A       10.10.23.40
hello.avi.iberia.local. 0       IN      A       10.10.23.40

# Health-Monitor has now declared Site2 VS as down. No more answers. Now traffic is distributed between the two remaining sites
  MESSAGE: This service resides in SITE1 AZ2
hello.avi.iberia.local. 1       IN      A       10.10.26.40
  MESSAGE: This service resides in SITE1 AZ2
hello.avi.iberia.local. 0       IN      A       10.10.26.40
  MESSAGE: This service resides in SITE1 AZ2
hello.avi.iberia.local. 2       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1
hello.avi.iberia.local. 1       IN      A       10.10.25.46
  MESSAGE: This service resides in SITE1 AZ1
hello.avi.iberia.local. 0       IN      A       10.10.25.46

Conclusion: If we lose one of the sites, the related health-monitor will declare the corresponding GSLB services as down and the DNS will stop answering with the associated IP address for the unreachable site. The recovery is fully automatic.

Test 3: Site1 AZ1 (AKO+AMKO) blackout

Now we will simulate the blackout condition of the Site1 AZ1 as depicted below

This is the cluster that owns the AMKO service so, as you can guess the DataPlane will automatically react to the disconnection of the Virtual Services at Site2. After a few seconds to allow health-monitors to declare the services as dead, you should see a traffic pattern like shown bellow in which the traffic is sent only to the available sites.

  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE2

Although the DataPlane has been restored, AMKO is not available to handle the new k8s services that are created or deleted in the remaining clusters so the operation for new objects has been also disrupted. At the time of writing there isn’t any out-of-the-box mechanism to provide extra availability to cope with this specific failure and you need to design a method to ensure AMKO is restored in any of the remaining clusters. Specific Kubernetes backup solutions such as Velero can be used to backup and restore all the AMKO related objects including CRDs and secrets.

Good news is that AMKO installation is quite straighforward and is stateless so the config is very light, basically you just can reuse the original values.yaml configuration files and spin up the then AMKO in any other cluster automatically providing the prerequired secrets and connectivity are present in the cluster of the recovery site.

As a best-practique is also recommended to revoke the credentials of the affected site to avoid overlapping of two controllers in case connectivity is recovered.

Creating Custom Alarms using ControlScript

ControlScripts are Python-based scripts which execute on the Avi Vantage Controllers. They are initiated by Alert Actions, which themselves are triggered by events within the system. Rather than simply alert an admin that a specific event has occurred, the ControlScript can take specific action, such as altering the Avi Vantage configuration or sending a custom message to an external system, such as telling VMware’s vCenter to scale out more servers if the current servers have reached resource capacity and are incurring lowered health scores.

With basic knowledge of python you can create an integration with an integration with external systems. In this examplo I will create a simple that consume an external webhook running in a popular messaging service such as Slack. A webhook (aka web callback or HTTP push API) is a way for an app to provide other applications with real-time information. A webhook delivers data to other applications as it happens, meaning you get data immediately. Unlike typical APIs where you would need to poll for data very frequently in order to get it real-time. This makes webhooks much more efficient for both provider and consumer.

The first step is to create a new Incoming Webhook App from Slack. Search for Incoming Webhooks under the App catalog and just Add it.

Depending on your corporate policies you might need to request access to the administration in advance to authorize this app. Once the authorization has been complete. Personalize the Webhook as per your preferences. I am sending the messages that are sent to this Webhook to my personal Channel. The Webhook URL represents the unique URL you need to use to post messages. You can add some Description and names and even an icon to differenciate from other regular messages.

Using postman you can try to reach your Webhook just

message='{"text": "This is a sample message"}'
curl --header "Content-Type: application/json" --request POST --data "$message" https://hooks.slack.com/services/<use-your-own-webhook-here> 

That that we have learnt to send messages to Slack using a webhook method we can configure some interesting alerts related to the GSLB services we are creating to add extra customization and trigger a message to our external Slack that will act as a Pager system. Remember we are using the AVI alarm framework to create an external notification but you have the power of Python on your hand to create more sophisticated event-driven actions.

We will focus on four different key events for our example here. We want to create a notification in those cases:

  • GS_MEMBER_DOWN.- Whenever a Member of the GSLB Pool is no longer available
  • GS_MEMBER_UP.- Whenever a Member of the GSLB Pool is up
  • GS_SERVICE_UP.- Whenever at least one of the GSLB Pool Members is up
  • GS_SERVICE_DOWN.- Whenever all the GSLB Members of the pool are down

We will start with the first Alert that we will call GS_SERVICE_UP. Go to Infrastructure > Alerts > Create. Set the parameters as depicted below.

We want to capture a particular Event and we will trigger the Alert whenever the system defined alert Gs Up occurs.

When tha event occurs we will trigger an action that we have defined in advance that we have called SEND_SLACK_GS_SERVICE_UP. This Action is not populated until you created it by clicking in the pencil icon.

The Alert Action that we will call SEND_SLACK_GS_SERVICE_UP can trigger different notification to the classical management systems via email, Syslog or SNMP. We are interested here in the ControlScript section. Click on the Pencil Icon and we will create a new ControlScript that we will call SLACK_GS_SERVICE_UP.

Before tuning the message I usually create a base script that will print the arguments that are passed to the ControlScript upon triggering. To do so just configure the script with the following base code.

#!/usr/bin/python
import sys
import json

def parse_avi_params(argv):
    if len(argv) != 2:
        return {}
    script_parms = json.loads(argv[1])
    print(json.dumps(script_parms,indent=3))
    return script_parms

# Main Script. (Call parse_avi_params to print the alarm contents.  
if __name__ == "__main__":
  script_parms = parse_avi_params(sys.argv)

Now generate the Alert in the system. An easy way is to scale-in all the deployments to zero replicas to force the Health-Monitor to declare the GSLB as down and then scale-out to get the GSLB service up and running again. After some seconds the health monitor declares the Virtual Services and down and the GSLB service will appear as red.

Now scale out one of the services to at least one replica and once the first Pool member is available, the system will declare the GSLB as up (green) again.

The output of the script shown below is a JSON object that contains all the details of the event.

{ "name": "GS_SERVICE_UP-gslbservice-7dce3706-241d-4f87-86a6-7328caf648aa-1608485017.473894-1608485017-17927224", "throttle_count": 0, "level": "ALERT_LOW", "reason": "threshold_exceeded", "obj_name": "hello.avi.iberia.local", "threshold": 1, "events": [ { "event_id": "GS_UP", "event_details": { "se_hm_gs_details": { "gslb_service": "hello.avi.iberia.local" } }, "obj_uuid": "gslbservice-7dce3706-241d-4f87-86a6-7328caf648aa", "obj_name": "hello.avi.iberia.local", "report_timestamp": 1608485017 } ] }

To beautify the output and to be able to understand more easily the contents of the alarm message, just paste the contents of the json object in a regular file such as /tmp/alarm.json and parse the output using jq. Now the ouput should look like this.

cat /tmp/alarm.json | jq '.'
{
  "name": "GS_SERVICE_UP-gslbservice-7dce3706-241d-4f87-86a6-7328caf648aa-1608485017.473894-1608485017-17927224",
  "throttle_count": 0,
  "level": "ALERT_LOW",
  "reason": "threshold_exceeded",
  "obj_name": "hello.avi.iberia.local",
  "threshold": 1,
  "events": [
    {
      "event_id": "GS_UP",
      "event_details": {
        "se_hm_gs_details": {
          "gslb_service": "hello.avi.iberia.local"
        }
      },
      "obj_uuid": "gslbservice-7dce3706-241d-4f87-86a6-7328caf648aa",
      "obj_name": "hello.avi.iberia.local",
      "report_timestamp": 1608485017
    }
  ]
}

Now you can easily extract the contents of the alarm and create your own message. A sample complete ControlScript for this particular event is shown below including the Slack Webhook Integration.

#!/usr/bin/python
import requests
import os
import sys
import json
requests.packages.urllib3.disable_warnings()

def parse_avi_params(argv):
    if len(argv) != 2:
        return {}
    script_parms = json.loads(argv[1])
    return script_parms

# Main Script entry
if __name__ == "__main__":
  script_parms = parse_avi_params(sys.argv)

  gslb_service=script_parms['events'][0]['event_details']['se_hm_gs_details']['gslb_service']
  message=("GS_SERVICE_UP: The service "+gslb_service+" is now up and running.")
  message_slack={
                 "text": "Alarm Message from NSX ALB",
                 "color": "#00FF00", 
                 "fields": [{
                 "title": "GS_SERVICE_UP",
                 "value": "The service *"+gslb_service+"* is now up and running."
                }]}
  # Display the message in the integrated AVI Alarm system
 print(message)

# Set the webhook_url to the one provided by Slack when you create the
# webhook at https://my.slack.com/services/new/incoming-webhook/
  webhook_url = 'https://hooks.slack.com/services/<use-your-data-here>'

  response = requests.post(
     webhook_url, data=json.dumps(message_slack),
     headers={'Content-Type': 'application/json'}
 )
  if response.status_code != 200:
    raise ValueError(
        'Request to slack returned an error %s, the response is:\n%s'
        % (response.status_code, response.text)
    )

Shutdown the Virtual Service by scaling-in the deployment to a number of replicas equal to zero and wait till the alarm appears

GSLB_SERVICE_UP Alarm

And you can see a nice formatted message in your slack app as shown below:

Custom Slack Message for Alert GS UP

Do the same process for the rest of the intented alarms you want to notify using webhook and personalize your messaging extracting the required fields from the json file. For your reference you can find a copy of the four ControlScripts I have created here.

Now shutdown and reactivate the service to verify how the alarms related to the GSLB services and members of the pool appears in your Slack application as shown below.

Custom Slack Alerts generated from AVI Alerts and ControlScript

That’s all so far regarding AMKO. Stay tuned!

AVI for K8s Part 6: Scaling Out using AMKO and GSLB for Multi-Region services

We have been focused on a single k8s cluster deployment so far. Although a K8s cluster is a highly distributed architecture that improves the application availability by itself, sometimes an extra layer of protection is needed to overcome failures that might affect to all the infraestructure in a specific physical zone such as a power outage or a natural disaster in the failure domain of the whole cluster. A common method to achieve extra availability is by running our applications in independent clusters that are located in different the Availability Zones or even in different datacenters located in different cities, region, countries… etc.

The AMKO facilitates multi-cluster application deployment extending application ingress controllers across multi-region and multi Availability Zone deployments mapping the same application deployed on multiple clusters to a single GSLB service. AMKO calls Avi Controller via API to create GSLB services on the leader cluster which synchronizes with all follower clusters. The general diagram is represented here.

AMKO is an Avi pod running in the Kubernetes GSLB leader cluster and in conjunction with AKO, AMKO facilitates multicluster application deployment. We will use the following building blocks to extend our single site to create a testbed architecture that will help us to verify how AMKO actually works.

TestBed for AMKO test for Active-Active split Datacenters and MultiAZ

As you can see, the above picture represents a comprehensive georedundant architecture. I have deployed two clusters in the Left side (Site1) that will share the same AVI Controller and is split in two separate Availability Zones, let’s say Site1 AZ1 and Site1AZ2. The AMKO operator will be deployed in the Site1 AZ1 cluster. In the right side we have another cluster with a dedicated AVI Controller. On the top we have also created a “neutral” site with a dedicated controller that will act as the GSLB leader and will resolve the DNS queries from external clients that are trying to reach our exposed FQDNs. As you can tell, each of the kubernetes cluster has their own AKO component and will publish their external services in different FrontEnd subnets: Site1 AZ1 will publish the services at 10.10.25.0/24 network, Site1 AZ2 will publish the services using 10.10.26.0/24 network and finally Site2 will publish their services using the 10.10.23.0/24 network.

Deploy AKO in the remaining K8S Clusters

AMKO works in conjunction with AKO. Basically AKO will capture Ingress and LoadBalancer configuration ocurring at the K8S cluster and calls the AVI API to translate the observed configuration into an external LoadBalancer implementeation whereas AMKO will be in charge of capturing the interesting k8s objects and calling the AVI API to implement a GSLB services that will provide load-balancing and High-Availability across different k8s cluster. Having said that, before going into the configuration of the AVI GSLB we need to prepare the infrastructure and deploy AKO in all the remaining k8s clusters in the same way we did with the first one as explained in the previous articles. The configuration yaml files for each of the AKO installations can be found here for your reference.

The selected parameters for the Site1 AZ2 AKO and saved in site1az2_values.yaml file are shown below

ParameterValueDescription
AKOSettings.disableStaticRouteSyncfalseAllow the AKO to create static routes to achieve
POD network connectivity
AKOSettings.clusterNameS1-AZ2A descriptive name for the cluster. Controller will use
this value to prefix related Virtual Service objects
NetworkSettings.subnetIP10.10.26.0Network in which create the Virtual Service Objects at AVI SE. Must be in the same VRF as the backend network used to reach k8s nodes. It must be configured with a static pool or DHCP to allocate IP address automatically.
NetworkSettings.subnetPrefix24Mask lenght associated to the subnetIP for Virtual Service Objects at SE.
NetworkSettings.vipNetworkList:
– networkName
AVI_FRONTEND_3026Name of the AVI Network object hat will be used to place the Virtual Service objects at AVI SE.
L4Settings.defaultDomainavi.iberia.localThis domain will be used to place the LoadBalancer service types in the AVI SEs.
ControllerSettings.serviceEngineGroupNameS1-AZ2-SE-GroupName of the Service Engine Group that AVI Controller use to spin up the Service Engines
ControllerSettings.controllerVersion20.1.5Controller API version
ControllerSettings.controllerIP10.10.20.43IP Address of the AVI Controller. In this case is shared with the Site1 AZ1 k8 cluster
avicredentials.usernameadminUsername to get access to the AVI Controller
avicredentials.passwordpassword01Password to get access to the AVI Controller
values.yaml for AKO at Site1 AZ2

Similarly the selected values for the AKO at Side 2 will are listed below

ParameterValueDescription
AKOSettings.disableStaticRouteSyncfalseAllow the AKO to create static routes to achieve
POD network connectivity
AKOSettings.clusterNameS2A descriptive name for the cluster. Controller will use
this value to prefix related Virtual Service objects
NetworkSettings.subnetIP10.10.23.0Network in which create the Virtual Service Objects at AVI SE. Must be in the same VRF as the backend network used to reach k8s nodes. It must be configured with a static pool or DHCP to allocate IP address automatically.
NetworkSettings.subnetPrefix24Mask associated to the subnetIP for Virtual Service Objects at SE.

NetworkSettings.vipNetworkList:
– networkName

AVI_FRONTEND_3023Name of the AVI Network object hat will be used to place the Virtual Service objects at AVI SE.
L4Settings.defaultDomainavi.iberia.localThis domain will be used to place the LoadBalancer service types in the AVI SEs.
ControllerSettings.serviceEngineGroupNameS2-SE-GroupName of the Service Engine Group that AVI Controller use to spin up the Service Engines
ControllerSettings.controllerVersion20.1.5Controller API version
ControllerSettings.controllerIP10.10.20.44IP Address of the AVI Controller. In this case is shared with the Site1 AZ1 k8 cluster
avicredentials.usernameadminUsername to get access to the AVI Controller
avicredentials.passwordpassword01Password to get access to the AVI Controller
values.yaml for AKO at Site2

As a reminder, each cluster is made up by single master and two worker nodes and we will use Antrea as CNI. To deploy Antrea we need to assign a CIDR block to allocate IP address for POD networking needs. The following table list the allocated CIDR per cluster.

Cluster NamePOD CIDR BlockCNI# Master# Workers
Site1-AZ110.34.0.0/18Antrea12
Site1-AZ210.34.64.0/18Antrea12
Site210.34.128.0/18Antrea12
Kubernetes Cluster allocated CIDRs

GSLB Leader base configuration

Now that AKO is deployed in all the clusters, let’s start with the GSLB Configuration. Before launching the GSLB Configuration, we need to create some base configuration at the AVI controller located at the top of the diagram shown at the beggining in order to prepare it to receive the dynamically created GLSB services. GSLB is a very powerful feature included in the AVI Load Balancer. A comprehensive explanation around GSLB can be found here. Note that in proposed architecture we will define the AVI controller located at a neutral site as an the Leader Active Site, meaning this site will be responsible totally or partially for the following key functions:

  1. Definition and ongoing synchronization/maintenance of the GSLB configuration
  2. Monitoring the health of configuration components
  3. Optimizing application service for clients by providing GSLB DNS responses to their FQDN requests based on the GSLB algorithm configured
  4. Processing of application requests

To create the base config at the Leader site we need to do some steps. GSLB will act at the end of the day as an “intelligent” DNS responder. That means we need to create a Virtual Service at the Data Plane (e.g. at the Service Engines) to answer the DNS queries coming from external clients. To do so, the very first step is to define a Service Engine Group and a DNS Virtual Service. Log into the GSLB Leader AVI Controller GUI and create the Service Engine Group. As shown in previous articles after Controller installation you need to create the Cloud (vcenter in our case) and then select the networks and the IP ranges that will be used for the Service Engine Placement. The intended diagram is represented below. The DNS service will pick up an IP address of the subnet 10.10.24.0 to create the DNS service.

GSLB site Service Engine Placement

As explained in previous articles we need to create the Service Engine Group, and then create a new Virtual Service using Advanced Setup. After this task are completed the AVI controller will spin up a new Service Engine and place it in the corresponding networks. If everything worked well after a couple of minutes we should have a green DNS application in the AVI dashboard like this:

DNS Virtual Service

Some details of the created DNS virtual service can be displayed hovering on the Virtual Service g-dns object. Note the assigned IP Address is 10.10.24.186. This is the IP that actually will respond to DNS queries. The service port is, in this case 53 that is the well-known port for DNS.

AVI DNS Virtual Service detailed configuration

DNS Zone Delegation

In a typical enterprise setup, a user has a Local pair of DNS configured that will receive the DNS queries and will be in charge of mainitiing the local domain DNS records and will also forward the requests for those domains that cannot be resolved locally (tipically to the DNS of the internet provider).

The DNS gives you the option to separate the namespace of the local domains into different DNS zones using an special configuration called Zone Delegation.  This setup is useful when you want to delegate the management of part of your DNS namespace to another location. In our case particular case AVI will be in charge for DNS resolution of the Virtual Services that we are being exposed to Internet by means of AKO. The local DNS will be in charge of the local domain iberia.local and a Zone Delegation will instruct the local DNS to forward the DNS queries for the authoritative DNS servers of the new zone.

In our case we will create a delegated Zone for the local subdomain avi.iberia.local. All the name resolution queries for that particular DNS namespace will be sent to the AVI DNS virtual service. I am using Windows Server DNS here show I will show you how to configure a Zone Delegation using this especific DNS implementation. There are equivalent process for doing this using Bind or other popular DNS software.

The first step is to create a regular DNS A record in the local zone that will point to the IP of the Virtual Server that is actually serving the DNS in AVI. In our case we defined a DNS Virtual Service called g-dns and the allocated IP Address was 10.10.24.186. Just add an New A Record as shown below

Now, create a New Delegation. Click on the local domain, right click and select the New Delegation option.

Windows Server Zone Delegation setup

A wizard is launched to assist you in the configuration process.

New Delegation Wizard

Specify the name for the delegated domain. In this case we are using avi. This will create a delegation for the avi.iberia.local subdomain.

Zone Delegation domain configuration

Next step is to specify the server that will serve the request for this new zone. In this case we will use the g-dns.iberia.local fqdn that we created previously and that resolves to the IP address of the AVI DNS Virtual Service.

Windows Server Zone Delegation. Adding DNS Server for the delegated Zone.

If you enter the information by clicking resolve you can tell how an error appears indicating that the target server is not Authoritative for this domain.

If we look into the AVI logs you can find that the virtual service has received a new special query called SOA (Start of Authority) that is used to verify if there the DNS service is Authoritative for a particular domain. AVI answer with a NXDOMAIN which means it is not configured to act as Authoritative server for avi.iberia.local.

If you want AVI to be Authoritative for a particular domain just edit the DNS Virtual service and click on pencil at the right of the Application Profile > System-DNS menu.

In the Domain Names/Subdomains section add the Domain Name. The configured domain name will be authoritativley serviced by our DNS Virtual Service. For this domain, AVI will send SOA parameters in the answer section of response when a SOA type query is received.

Once done, you can query the DNS Virtual server using the domain and you will receive a proper SOA response from the server.

dig avi.iberia.local @10.10.24.186

; <<>> DiG 9.16.1-Ubuntu <<>> avi.iberia.local @10.10.24.186
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10856
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;avi.iberia.local.              IN      A

;; AUTHORITY SECTION:
avi.iberia.local.       30      IN      SOA     g-dns.iberia.local. johernandez\@iberia.local. 1 10800 3600 86400 30

;; Query time: 4 msec
;; SERVER: 10.10.24.186#53(10.10.24.186)
;; WHEN: Sat Dec 19 09:41:43 CET 2020
;; MSG SIZE  rcvd: 123

In the same way, you can see how the Windows DNS Service validates now the server information because as shown above is responding to the SOA query type indicationg that way it is authoritative for the intended avi.iberia.local delegated domain.

New Delegation Wizard. DNS Target server validated

If we explore into the logs now we can see how our AVI DNS Virtual Service is now sending a NOERROR message when a SOA query for the domain avi.iberia.local is received. This is an indication for the upstream DNS server this is a legitimate server to forward queries when someone tries to resolve a fqdn that belongs to the delegated domain. Although using SOA is a kind of best practique, the MSFT DNS server will send queries directed to the delegated domain towards the downstream configured DNS servers even if it is not getting a SOA response for that particular delegated domain.

SOA NOERROR Answer

As you can see, the Zone delegation process simply consists in creating an special Name Server (NS) type record that point to our DNS Virtual Server when a DNS for avi.iberia.local is received.

NS Entries for the delegated zone

To test the delegation we can create a dummy record. Edit the DNS Virtual Service clicking the pencil icon and go to Static DNS Records tab. Then create a new DNS record such as test.avi.iberia.local and set an IP address of your choice. In this case 10.10.24.200.

In case you need extra debugging and go deeper in how the Local DNS server is actually handling the DNS queries you can always enable debugging at the MSFT DNS. Open the DNS application from Windows Server click on your server and go to Action > Properties and then click on the Debug Logging tab. Select Log packets for debugging. Specify also a File Path and Name in the Log File Section at the bottom.

Windows Server DNS Debug Logging window

Now it’s time to test how everything works together. Using dig tool from a local client configured to use the local DNS servers in which we have created the Zone Delegation try to resolve test.avi.iberia.local FQDN.

 dig test.avi.iberia.local

; <<>> DiG 9.16.1-Ubuntu <<>> test.avi.iberia.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20425
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;test.avi.iberia.local.         IN      A

;; ANSWER SECTION:
test.avi.iberia.local.  29      IN      A       10.10.24.200

;; Query time: 7 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Fri Dec 18 22:59:17 CET 2020
;; MSG SIZE  rcvd: 66

Open the log file you have defined for debugging in Windows DNS Server and look for the interenting query (you can use your favourite editor and search for some strings to locate the logs). The following recursive events shown in the log corresponds to the expected behaviour for a Zone Delegation.

# Query Received from client for test.avi.iberia.local
18/12/2020 22:59:17 1318 PACKET  000002DFAB069D40 UDP Rcv 192.168.145.5   d582   Q [0001   D   NOERROR] A      (4)test(3)avi(6)iberia(5)local(0)

# Query sent to the NS for avi.iberia.local zone at 10.10.24.186
18/12/2020 22:59:17 1318 PACKET  000002DFAA1BB560 UDP Snd 10.10.24.186    4fc9   Q [0000       NOERROR] A      (4)test(3)avi(6)iberia(5)local(0)

# Answer received from 10.20.24.186 which is the g-dns Virtual Service
18/12/2020 22:59:17 1318 PACKET  000002DFAA052170 UDP Rcv 10.10.24.186    4fc9 R Q [0084 A     NOERROR] A      (4)test(3)avi(6)iberia(5)local(0)

# Response sent to the originating client
18/12/2020 22:59:17 1318 PACKET  000002DFAB069D40 UDP Snd 192.168.145.5   d582 R Q [8081   DR  NOERROR] A      (4)test(3)avi(6)iberia(5)local(0)

Note how the ID of the AVI DNS response is 20425 as shown below and corresponds to 4fc9 in hexadecimal as shown in the log trace of the MS DNS Server above.

DNS Query

GSLB Leader Configuration

Now that the DNS Zone delegation is done, let’s move to the GSLB AVI controller again to create the GSLB Configuration. If we go to Infrastructure and GSLB note how the GSLB status is set to Off.

GSLB Configuration

Click on the Pencil Icon to turn the service on and populate the fields above as in the example below. You need to specify a GSLB Subdomain that matches with the inteded DNS zone you will create the virtual services in this case avi.iberia.local. Then click Save and Set DNS Virtual Services.

New GSLB Configuration

Now select the DNS Virtual Service we created before and pick-up the subdomains in which we are going to create the GSLB Services from AMKO.

Add GSLB and DNS Virtual Services

Save the config and you will get this screen indicating the service the GSLB service for the avi.iberia.local subdomain is up and running.

GSLB Site Configuration Status

AMKO Installation

The installation of AMKO is quite similar to the AKO installation. It’s important to note that AMKO assumes it has connectivity to all the k8s Master API server across the deployment. That means all the configs and status across the different k8s clusters will be monitored from a single AMKO that will reside, in our case in the Site1 AZ1 k8s cluster. As in the case of AKO we will run the AMKO pod in a dedicated namespace. We also will a namespace called avi-system for this purpose. Ensure the namespace is created before deploying AMKO, otherwise use kubectl to create it.

kubectl create namespace avi-system

As you may know if you are familiar with k8s, to get access to the API of the K8s cluster we need a kubeconfig file that contains connection information as well as the credentials needed to authenticate our sessions. The default configuration file is located at ~/.kube/config  folder of the master node and is referred to as the kubeconfig file. In this case we will need a kubeconfig file containing multi-cluster access. There is a tutorial on how to create the kubeconfig file for multicluster access in the official AMKO github located at this site.

The contents of my kubeconfig file will look like this. You can easily identify different sections such as clusters, contexts and users.

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <ca-data.crt>
    server: https://10.10.24.160:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: <client-data.crt>
    client-key-data: <client-key.key>

Using the information above and combining the information extracted from the three individual kubeconfig files we can create a customized multi-cluster config file. Replace certificates and keys with your specific kubeconfig files information and also choose representative names for the contexts and users. A sample version of my multicluster kubeconfig file can be accessed here for your reference.

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <Site1-AZ1-ca.cert> 
    server: https://10.10.24.160:6443
  name: Site1AZ1
- cluster:
    certificate-authority-data: <Site1-AZ2-ca.cert> 
    server: https://10.10.23.170:6443
  name: Site1AZ2
- cluster:
    certificate-authority-data: <Site2-ca.cert> 
    server: https://10.10.24.190:6443
  name: Site2

contexts:
- context:
    cluster: Site1AZ1
    user: s1az1-admin
  name: s1az1
- context:
    cluster: Site1AZ2
    user: s1az2-admin
  name: s1az2
- context:
    cluster: Site2
    user: s2-admin
  name: s2

kind: Config
preferences: {}

users:
- name: s1az1-admin
  user:
    client-certificate-data: <s1az1-client.cert> 
    client-key-data: <s1az1-client.key> 
- name: s1az2-admin
  user:
    client-certificate-data: <s1az2-client.cert> 
    client-key-data: <s1az1-client.key> 
- name: s2-admin
  user:
    client-certificate-data: <site2-client.cert> 
    client-key-data: <s1az1-client.key> 

Save the multiconfig file as gslb-members. To verify there is no sintax problems in our file and providing there is connectivity to the API server of each cluster we can try to read the created file using kubectl as shown below.

kubectl --kubeconfig gslb-members config get-contexts
CURRENT   NAME         CLUSTER    AUTHINFO      NAMESPACE
          s1az1        Site1AZ1   s1az1-admin
          s1az2        Site1AZ2   s1az2-admin
          s2           Site2      s2-admin

It very common and also useful to manage the three clusters from a single operations server and change among different contexts to operate each of the clusters centrally. To do so just place this newly multi kubeconfig file in the default path that kubectl will look for the kubeconfig file which is $HOME/.kube/config. Once done you can easily change between contexts just by using kubectl and use-context keyword. In the example below we are switching to context s2 and then note how kubectl get nodes list the nodes in the cluster at Site2.

kubectl config use-context s2
Switched to context "s2".

kubectl get nodes
NAME                 STATUS   ROLES    AGE   VERSION
site2-k8s-master01   Ready    master   60d   v1.18.10
site2-k8s-worker01   Ready    <none>   60d   v1.18.10
site2-k8s-worker02   Ready    <none>   60d   v1.18.10

Switch now to the target cluster in which the AMKO is going to be installed. In our case the context for accessing that cluster is s1az1 that corresponds to the cluster located as Site1 and in the Availability Zone1. Once switches, we will generate a k8s generic secret object that we will name gslb-config-secret that will be used by AMKO to get acccess to the three clusters in order to watch for the required k8s LoadBalancer and Ingress service type objects.

kubectl config use-context s1az1
Switched to context "s1az1".

kubectl create secret generic gslb-config-secret --from-file gslb-members -n avi-system
secret/gslb-config-secret created

Now it’s time to install AMKO. First you have to add a new repo that points to the url in which AMKO helm chart is published.

helm repo add amko https://avinetworks.github.io/avi-helm-charts/charts/stable/amko

If we search in the repository we can see the last version available, in this case 1.2.1

helm search repo

NAME     	CHART VERSION    	APP VERSION      	DESCRIPTION
amko/amko	1.4.1	            1.4.1	            A helm chart for Avi Multicluster Kubernetes Operator

The AMKO base config is created using a yaml file that contains the required configuration items. To get a sample file with default

helm show values ako/amko --version 1.4.1 > values_amko.yaml

Now edit the values_amko.yaml file that will be the configuration base of our Multicluster operator. The following table shows some of the specific values for AMKO.

ParameterValueDescription
configs.controllerVersion20.1.5Release Version of the AVI Controller
configs.gslbLeaderController10.10.20.42IP Address of the AVI Controller that will act as GSLB Leader
configs.memberClustersclusterContext: “s1az1”
“s1az2”
“s2”
Specifiy the contexts used in the gslb-members file to reach the K8S API in the differents k8s clusters
gslbLeaderCredentials.usernameadminUsername to get access to the AVI Controller API
gslbLeaderCredentials.passwordpassword01Password to get access to the AVI Controller API
gdpConfig.appSelector.labelapp: gslbAll the services that contains a label field that matches with app:gslb will be considered by AMKO. A namespace selector can also be used for this purpose
gdpConfig.matchClusterss1az1
s1az2
s1az2
Name of the Service Engine Group that AVI Controller use to spin up the Service Engines
gdpConfig.trafficSplittraffic split ratio (see yaml file below for sintax) Define how DNS answers are distributed across clusters.
values.yaml for AKO at Site2

The full amko_values.yaml I am using as part of this lab is shown below and can also be found here for your reference. Remember to use the same contexts names as especified in the gslb-members multicluster kubeconfig file we used to create the secret object otherwise it will not work.

# Default values for amko.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

image:
  repository: avinetworks/amko
  pullPolicy: IfNotPresent

configs:
  gslbLeaderController: "10.10.20.42"
  controllerVersion: "20.1.2"
  memberClusters:
    - clusterContext: "s1az1"
    - clusterContext: "s1az2"
    - clusterContext: "s2"
  refreshInterval: 1800
  logLevel: "INFO"

gslbLeaderCredentials:
  username: admin
  password: Password01

globalDeploymentPolicy:
  # appSelector takes the form of:
  appSelector:
   label:
      app: gslb

  # namespaceSelector takes the form of:
  # namespaceSelector:
  #  label:
  #     ns: gslb

  # list of all clusters that the GDP object will be applied to, can take
  # any/all values
  # from .configs.memberClusters
  matchClusters:
    - "s1az1"
    - "s1az1"
    - "s2"

  # list of all clusters and their traffic weights, if unspecified,
  # default weights will be
  # given (optional). Uncomment below to add the required trafficSplit.
  trafficSplit:
    - cluster: "s1az1"
      weight: 6
    - cluster: "s1az2"
      weight: 4
    - cluster: "s2"
      weight: 2

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name:

resources:
  limits:
    cpu: 250m
    memory: 300Mi
  requests:
    cpu: 100m
    memory: 200Mi

service:
  type: ClusterIP
  port: 80

persistentVolumeClaim: ""
mountPath: "/log"
logFile: "amko.log"

After customizing the values.yaml file we can now install the AMKO through helm.

helm install amko/amko --generate-name --version 1.4.1 -f values_amko.yaml --namespace avi-system

The installation creates a AMKO Pod and also a GSLBConfig and a GlobalDeploymentPolicy CRDs objects that will contain the configuration. It is important to note that any change to the GlobalDeploymentPolicy object is handled at runtime and does not require a full restart of AMKO pod. As an example, you can change on the fly how the traffic is split across diferent cluster just by editing the correspoding object.

Let’s use Octant to explore the new objects created by AMKO installation. First, we need to change the namespace since all the related object has been created within the avi-system namespace. At the top of the screen switch to avi-system.

If we go to Workloads, we can easily identify the pods at avi-system namespace. In this case apart from ako-0 which is also running in this cluster, it appears amko-0 as you can see in the below screen.

Browse to Custom Resources and you can identify the two Custom Resource Definition that AMKO installation has created. The first one is globaldeploymentpolicies.amko.vmware.com and there is an object called global-gdp. This one is the object that is used at runtime to change some policies that dictates how AMKO will behave such as the labels we are using to select the interesing k8s services and also the load balancing split ratio among the different clusters. At the moment of writing the only available algorithm to split traffic across cluster using AMKO is a weighted round robin but other methods such as GeoRedundancy are currently roadmapped and will be available soon.

In the second CRD called gslbconfigs.amko.vmware.com we can find an object named gc-1 that displays the base configuration of the AMKO service. The only parameter we can change at runtime without restarting AMKO is the log level.

Alternatively, if you prefer command line you can always edit the CRD object through regular kubectl edit commands like shown below

kubectl edit globaldeploymentpolicies.amko.vmware.com global-gdp -n avi-system

Creating Multicluster K8s Ingress Service

Before adding extra complexity to the GSLB architecture let’s try to create our first multicluster Ingress Service. For this purpose I will use another kubernetes application called hello-kubernetes whose declarative yaml file is posted here. The application presents a simple web interface and it will use the MESSAGE environment variable in the yaml file definition to specify a message that will appear in the http response. This will be very helpful to identify which server is actually serving the content at any given time.

The full yaml file shown below defines the Deployment, the Service and the Ingress.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello-kubernetes
        image: paulbouwer/hello-kubernetes:1.7
        ports:
        - containerPort: 8080
        env:
        - name: MESSAGE
          value: "MESSAGE: This service resides in Site1 AZ1"
---
apiVersion: v1
kind: Service
metadata:
  name: hello
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: hello
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: hello
  labels:
    app: gslb
spec:
  rules:
    - host: hello.avi.iberia.local
      http:
        paths:
        - path: /
          backend:
            serviceName: hello
            servicePort: 80

Note we are passing a MESSAGE variable to the container that will be used the display the text “MESSAGE: This service resides in SITE1 AZ1”. Note also in the metadata configuration of the Ingress section we have defined a label with the value app:gslb. This setting will be used by AMKO to select that ingress service and create the corresponding GSLB configuration at the AVI controller.

Let’s apply the hello.yaml file with the contents shown above using kubectl

kubectl apply -f hello.yaml
 service/hello created
 deployment.apps/hello created
 ingress.networking.k8s.io/hello created

We can inspect the events at the AMKO pod to understand the dialogue with the AVI Controller API.

kubectl logs -f amko-0 -n avi-system


# A new ingress object has been detected with the appSelector set
2020-12-18T19:00:55.907Z        INFO    k8sobjects/ingress_object.go:295        objType: ingress, cluster: s1az1, namespace: default, name: hello/hello.avi.iberia.local, msg: accepted because of appSelector
2020-12-18T19:00:55.907Z        INFO    ingestion/event_handlers.go:383 cluster: s1az1, ns: default, objType: INGRESS, op: ADD, objName: hello/hello.avi.iberia.local, msg: added ADD/INGRESS/s1az1/default/hello/hello.avi.iberia.local key


# A Health-Monitor object to monitor the health state of the VS behind this GSLB service
2020-12-18T19:00:55.917Z        INFO    rest/dq_nodes.go:861    key: admin/hello.avi.iberia.local, hmKey: {admin amko--http--hello.avi.iberia.local--/}, msg: HM cache object not found
2020-12-18T19:00:55.917Z        INFO    rest/dq_nodes.go:648    key: admin/hello.avi.iberia.local, gsName: hello.avi.iberia.local, msg: creating rest operation for health monitor
2020-12-18T19:00:55.917Z        INFO    rest/dq_nodes.go:425    key: admin/hello.avi.iberia.local, queue: 0, msg: processing in rest queue
2020-12-18T19:00:56.053Z        INFO    rest/dq_nodes.go:446    key: admin/hello.avi.iberia.local, msg: rest call executed successfully, will update cache
2020-12-18T19:00:56.053Z        INFO    rest/dq_nodes.go:1085   key: admin/hello.avi.iberia.local, cacheKey: {admin amko--http--hello.avi.iberia.local--/}, value: {"Tenant":"admin","Name":"amko--http--hello.avi.iberia.local--/","Port":80,"UUID":"healthmonitor-5f4cc076-90b8-4c2e-934b-70569b2beef6","Type":"HEALTH_MONITOR_HTTP","CloudConfigCksum":1486969411}, msg: added HM to the cache

# A new GSLB object named hello.avi.iberia.local is created. The associated IP address for resolution is 10.10.25.46. The weight for the Weighted Round Robin traffic distribution is also set. A Health-Monitor named amko--http--hello.avi.iberia.local is also attached for health-monitoring
2020-12-18T19:00:56.223Z        INFO    rest/dq_nodes.go:1161   key: admin/hello.avi.iberia.local, cacheKey: {admin hello.avi.iberia.local}, value: {"Name":"hello.avi.iberia.local","Tenant":"admin","Uuid":"gslbservice-7dce3706-241d-4f87-86a6-7328caf648aa","Members":[{"IPAddr":"10.10.25.46","Weight":6}],"K8sObjects":["INGRESS/s1az1/default/hello/hello.avi.iberia.local"],"HealthMonitorNames":["amko--http--hello.avi.iberia.local--/"],"CloudConfigCksum":3431034976}, msg: added GS to the cache

If we go to the AVI Controller acting as GSLB Leader, from Applications > GSLB Services

Click in the pencil icon to explore the configuration AMKO has created upon creation of the Ingress Service. An application named hello.avi.iberia.local with a Health-Monitor has been created as shown below:

Scrolling down you will find a new GSLB pool has been defined as well.

Click on the pencil icon to see another properties

Finally, you get the IPv4 entry that the GSLB service will use to answer external queries. This IP address was obtained from the Ingress service external IP Address property at the source site that, by the way, was allocated by the integrated IPAM in the AVI Controller in that site.

If you go to the Dashboard, from Applications > Dashboard > View GSLB Services. You can see a representation of the GSLB object hello.avi.iberia.local that has a GSLB pool called hello.avi.iberia.local-10 that at the same time has a Pool Member entry with the IP address 10.10.25.46 that corresponds to the allocated IP for our hello-kubernetes service.

If you open a browser and go to http://hello.avi.iberia.local you can see how you can get the content of the hello-kubernetes application. Note how the message environment variable we pass is appearing as part of the content the web server is sending us. In that case the message indicates that the service we are accessing to resides in SITE1 AZ1.

Now it’s time to create the corresponding services in the other remaining clusters to convert the single site application into a multi-AZ, multi-region application. Just change context using kubectl and now apply the yaml file changing the MESSAGE variable to “MESSAGE: This services resides in SITE1 AZ2” for hello-kubernetes app at Site1 AZ2.

kubectl config use-context s1az2
 Switched to context "s1az2".
kubectl apply -f hello_s1az2.yaml
 service/hello created
 deployment.apps/hello created
 ingress.networking.k8s.io/hello created

And similarly do the same for site2 using now “MESSAGE: This service resides in SITE2” for the same application at Site2. The configuration files for the hello.yaml files of each cluster can be found here.

kubectl config use-context s2
 Switched to context "s2".
kubectl apply -f hello_s2.yaml
 service/hello created
 deployment.apps/hello created
 ingress.networking.k8s.io/hello created

When done you can go to the GSLB Service and verify there are new entries in the GSLB Pool. It can take some tome to declare the system up and show it in green while the health-monitor is checking for the availability of the application just created.

After some seconds the three new systems should show a comforting green color as an indication of the current state.

GSLB Pool members for the three clusters showing up status

If you explore the configuration of the new created service you can see the assigned IP address for the new Pool members as well as the Ratio that has been configuring according to the AMKO trafficSplit parameter. For the Site1 AZ2 the assigned IP address is 10.10.26.40 and the ratio has been set to 4 as declared in the AMKO policy.

Pool Member properties for GSLB service at Site2

In the same way, for the Site 2 the assigned IP address is 10.10.23.40 and the ratio has been set to 2 as dictated by AMKO.

Pool Member properties for GSLB service at Site2

If you go to Dashboard and display the GSLB Service, you can get a global view of the GSLB Pool and its members

GSLB Service representation

Testing GSLB Service

Now its time to test the GSLB Service, if you open a browser and refresh periodically you can see how the MESSAGE is changing indicating we are reaching the content at different sites thanks to the load balancing algorithm implemented in the GSLB service. For example, at some point you would see this message that means we are reaching the HTTP service at Site1 AZ2.

And also this message indicating the service is being served from SITE2.

An smarter way to verify the proper behavior check how the system is creating a simple script that do the “refresh” task for us on a programatic way in order to analyze how the system is answering our external DNS requests. Before starting we need to change the TTL for our service to accelerate the local DNS cache expiration. This is useful for testing purposes but is not a good practique for a production environment. In this case we will configure the GSLB service hello.avi.iberia.local to serve the DNS answers with a TTL equal to 2 seconds.

TTL setting for DNS resolution

Let’s create a single line infinite loop using shell scripting to send a curl request with an interval of two seconds to the inteded URL at http://hello.avi.iberia.local. We will grep the MESSAGE string to display the line of the HTTP response to figure out which of the three sites are actually serving the content. Remember we are using here a Weighted Round Robin algorithm to achieve load balance, this is the reason why the frequency of the different messages are not the same as you can perceive below.

while true; do curl -m 2 http://hello.avi.iberia.local -s | grep MESSAGE; sleep 2; done
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE2
  MESSAGE: This Service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ1
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE1 AZ2
  MESSAGE: This service resides in SITE2

If we go to the AVI Log of the related DNS Virtual Service we can see how the sequence of responses are following the Weighted Round Robin algorithm as well.

Round Robin Resolution

Additionally, I have created an script here that helps you create the infinite loop and shows some well formatted and coloured information about DNS resolution and HTTP response that can be very helpful for testing and demo. This works with hello-kubernetes application but you can easily modify to fit your needs. The script needs the URL and the interval of the loop as input parameters.

./check_dns.sh hello.avi.iberia.local 2

A sample output is shown below for your reference.

check_dns.sh sample output

Exporting AVI Logs for data visualization and trafficSplit analysis

As you have already noticed during this series of articles NSX Advanced Load Balancer stands out for its rich embedded Analytics engine that help you to visualize all the activity in your application. There are yet sometimes when you prefer to export the data for further analysis using a Bussiness Intelligence tool of your choice. As an example I will show you a very simple way to verify the traffic split distribution across the three datacenter exporting the raw logs. That way we will check if traffic really fits with the configured ratio we have defined for load balancing (6:3:1 in this case). I will export the logs and analyze

Remember the trafficSplit setting can be changed at runtime just by editing the YAML file associated with the global-gdp object AMKO created. Using octant we can easily browse to the avi-system namespace and then go to Custom Resource > globaldeploymentpolicies.amko.vmware.com, click on global-gdp object and click on YAML tab. From here modify the assigned weight for each cluster as per your preference, click UPDATE and you are done.

Changing the custom resource global-gdp object to 6:3:1

Whenever you change this setting AMKO will refresh the configuration of the whole GSLB object to reflect the new changes. This produce a rewrite of the TTL value to the default setting of 30 seconds. If you want to repeat the test to verify this new trafficsplit distribution ensure you change the TTL to a lower value such as 1 second to speed up the expiration of the TTL cache.

The best tool to send DNS traffic is dnsperf and is available here. This performance tool for DNS read input files describing DNS queries, and send those queries to DNS servers to measure performance. In our case we just have one GSLB Service so far so the queryfile.txt contains a single line with the FQDN under test and the type of query. In this case we will send type A queries. The contents of the file is shown below

cat queryfile.txt
hello.avi.iberia.local. A

We will start by sending 10000 queries to our DNS. In this case we will send the queries to the DNS IP (-d option) of the virtual service under testing to make sure we are measuring the performance of the AVI DNS an not the parent domain DNS that is delegating the DNS Zone. To specify the number of queries use the -n option that instructs dnsperf tool to iterate over the same file the desired number of times. When the test finished it will display the observed performance metrics.

dnsperf -d queryfile.txt -s 10.10.24.186 -n 10000
DNS Performance Testing Tool
Version 2.3.4
[Status] Command line: dnsperf -d queryfile.txt -s 10.10.24.186 -n 10000
[Status] Sending queries (to 10.10.24.186)
[Status] Started at: Wed Dec 23 19:40:32 2020
[Status] Stopping after 10000 runs through file
[Timeout] Query timed out: msg id 0
[Timeout] Query timed out: msg id 1
[Timeout] Query timed out: msg id 2
[Timeout] Query timed out: msg id 3
[Timeout] Query timed out: msg id 4
[Timeout] Query timed out: msg id 5
[Timeout] Query timed out: msg id 6
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         10000
  Queries completed:    9994 (99.94%)
  Queries lost:         12 (0.06%)

  Response codes:       NOERROR 9994 (100.00%)
  Average packet size:  request 40, response 56
  Run time (s):         0.344853
  Queries per second:   28963.065422

  Average Latency (s):  0.001997 (min 0.000441, max 0.006998)
  Latency StdDev (s):   0.001117

From the data below you can see how the performance figures are pretty good. With a single vCPU the Service Engine has responded 10.000 queries at a rate of almost 30.000 queries per second. When the dnsperf test is completed, go to the Logs section of the DNS VS to check how the logs are showed up in the GUI.

g-dns AVI Logs with Throttling enabled

As you can see, only a very small fraction of the expected logs are showed in the console. The reason for this is because the collection of client logs is throttled at the Service Engines. Throttling is just a rate-limiting mechanism to save resources and is implemented as number of logs collected per second. Any excess logs in a second are dropped.

Throttling is controlled by two sets of properties: (i) throttles specified in the analytics policy of a virtual service, and (ii) throttles specified in the Service Engine group of a Service Engine. Each set has a throttle property for each type of client log. A client log of a specific type could be dropped because of throttles for that type in either of the above two sets.

You can modify the Log Throttling at the virtual service level by editing Virtual Service:g-dns and click on the Analytics tab. In the Client Log Settings disable log-throttling for the Non-significant Logs setting the value to zero as shown below:

Log Throttling for Non-significant logs at the Virtual Service

In the same way you can modify Log Throttling settings at the Service Engine Group level also. Edit the Service Engine Group associated to the DNS Service. Click on the Advanced tab and go to the Log Collection and Streaming Settings. Set the Non-significatn Log Throttle to 0 Logs/Seconds which means no throttle is applied.

Log Throttling for Non-significant logs at the Service Engine Group

Be careful applying this settings for a production environment!! Repeat the test and now exploring the logs. Hover the mouse on the bar that is showing traffic and notice how the system is now getting all the logs to the AVI Analytics console.

Let’s do some data analysis to verify if the configured splitRatio is actually working as expected. We will use dnsperf but now we will rate-limit the number of queries using the -Q option. We will send 100 queries per second and for 5000 queries overall.

dnsperf -d queryfile.txt -s 10.10.24.186 -n 5000 -Q 100

This time we can use the log exportation capabilities of the AVI Controller. Select the period of logs you want to export, click on Export button to get All 5000 logs

Selection of interesting traffic logs and exportation

Now you have a CSV file containing all the analytics data within scope. There are many options to process the file. I am showing here a rudimentary way using Google Sheets application. If you have a gmail account just point your browser to https://docs.google.com/spreadsheets. Now create a blank spreadsheet and go to File > Import as shown below.

Click the Upload tab and browse for the CSV file you have just downloaded. Once the uploaded process has completed Import the Data contained in the file using the default options as shown below.

CSV importation

Google Sheets automatically organize our CSV separated values into columns producing a quite big spreadsheet with all the data as shown below.

DNS Traffic Logs SpreadSheet

Locate the column dns_ips that contains the responses DNS Virtual Service is sending when queried for the dns_fqdn hello.avi.iberia.local. Select the full column by clicking on the corresponding dns_ips field, in this case, column header marked as BJ a shown below:

And now let google sheets to do the trick for us. Google Sheets has some automatic exploration capabilities to suggest some cool visualization graphics for the selected data. Just locate the Explore button at the bottom right of the spreadsheet

Automatic Data Exploration in Google Sheets

When completed, Google Sheet offers the following visualization graphs.

Google Sheet Automatic Exploration

For the purpose of describing how the traffic is split across datacenters, the PieChart or the Frequency Histogram can be very useful. Add the suggested graphs into the SpreadSheet and after some little customization to show values and changing the Title you can get this nice graphics. The 6:3:1 fits perfectly with the expected behaviour.

A use case for the trafficSplit feature might be the implementation of a Canary Deployment strategy. With canary deployment, you deploy a new application code in a small part of the production infrastructure. Once the application is signed off for release, only a few users are routed to it. This minimizes any impact or errors. I will change the ratio to simulate a Canary Deployment by directing just a small portion of traffic to the Site2 which would be the site in which deploying the new code. If you change the will change to get an even distribution of 20:20:1 as shown below. With this setting the theoretical traffic sent to the Canary Deployment test would be 1/(20+20+1)=2,4%. Let’s see how it goes.

  trafficSplit:
  - cluster: s1az1
    weight: 20
  - cluster: s1az2
    weight: 20
  - cluster: s2
    weight: 1

Remember everytime we change this setting in AMKO the full GSLB service is refreshed including the TTL setting. Set the TTL to 1 for the GSLB service again to speed up the expiration of the DNS cache and repeat the test.

dnsperf -d queryfile.txt -s 10.10.24.186 -n 5000 -Q 100

If you export and process the logs in the same way you will get the following results

trafficSplit ratio for Canary Deployment Use Case

Invalid DNS Query Processing and curl

I have created this section a because although it seems irrelevant it can cause unexpected behavior depending on the application we use to establish the HTTP sessions. AVI DNS Virtual Service has two different settings to respond to Invalid DNS Queries. You can see the options by going to the System-DNS profile attached to our Virtual Service.

The most typical setting for the Invalid DNS Query Processing is to configure the server to “Respond to unhandled DNS requests” to actively send a NXDOMAIN answer for those queries that cannot be resolved by the AVI DNS.

Let’s give a try to the second method which is “Drop unhandled DNS requests“. After configuring it and save it we will, if you use curl to open a HTTP connection to the target site, in this case hello.avi.iberia.local , you realize it takes some time to receive the answer from our server.

curl hello.avi.iberia.local
<  after a few seconds... > 
<!DOCTYPE html>
<html>
<head>
    <title>Hello Kubernetes!</title>
    <link rel="stylesheet" type="text/css" href="/css/main.css">
    <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Ubuntu:300" >
</head>
<body>
  <div class="main">
    <img src="/images/kubernetes.png"/>
    <div class="content">
      <div id="message">
  MESSAGE: This service resides in SITE1 AZ1
</div>
<div id="info">
  <table>
    <tr>
      <th>pod:</th>
      <td>hello-6b9797894-m5hj2</td>
    </tr>
    <tr>
      <th>node:</th>
      <td>Linux (4.15.0-128-generic)</td>
    </tr>
  </table>
</body>

If we look into the AVI log for the request we can see how the request has been served very quickly in some milliseconds, so it seems there is nothing wrong with the Virtual Service itself.

But if we look a little bit deeper by capturing traffic at the client side we can see what has happened.

As you can see in the traffic capture above, the following packets has been sent as part of the attampt to estabilish a connection using curl to the intended URL at hello.avi.iberia.local:

  1. The curl client sends a DNS request type A asking for the fqdn hello.avi.iberia.local
  2. The curl client sends a second DNS request type AAAA (asking for an IPv6 resolution) for the same fqdn hello.avi.iberia.local
  3. The DNS answers some milliseconds after with the A type IP address resolution = 10.10.25.46
  4. Five seconds after, since curl has not received an answer for the AAAA type query, curl reattempts sending both type A and type AAAA queries one more time.
  5. The DNS answers again very quickly with the A type IP address resolution = 10.10.25.46
  6. Finally the DNS sends a Server Failure indicating theres is no response for AAAA type hello.avi.iberia.local
  7. Only after this the curl client start the HTTP connection to the URL

As you can see, the fact that the AVI DNS server is dropping the traffic is causing the curl implementation to wait up to 9 seconds until the timeout is reached. We can avoid this behaviour by changing the setting in the AVI DNS Virtual Service.

Configure again the AVI DNS VS to “Respond to unhandled DNS requests” as shown below.

Now we can check how the behaviour has now changed.

As you can see above, curl receives an inmediate answer from the DNS indicating that there is no AAAA record for this domain so the curl can proceed with the connection.

Whereas in the AAAA type record the AVI now actively responses with a void Answer as shown below.

You can also check the behaviour using dig and querying for the AAAA record for this particular FQDN and you will get a NOERROR answer as shown below.

dig AAAA hello.avi.iberia.local

; <<>> DiG 9.16.1-Ubuntu <<>> AAAA hello.avi.iberia.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60975
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;hello.avi.iberia.local.                IN      AAAA

;; Query time: 4 msec
;; SERVER: 10.10.0.10#53(10.10.0.10)
;; WHEN: Mon Dec 21 09:03:05 CET 2020
;; MSG SIZE  rcvd: 51

Summary

We have now a fair understanding on how AMKO actually works and some techniques for testing and troubleshooting. Now is the time to explore AVI GSLB capabilities for creating a more complex GSLB hierarchy and distribute the DNS tasks among different AVI controllers. The AMKO code is rapidly evolving and new features has been incorporated to add extra control related to GSLB configuration such as changing the DNS algorithm. Stay tuned for further articles that will cover this new available functions.

AVI for K8s Part 5: Deploying K8s secure Ingress type services

In that section we will focus on the secure ingress services which is the most common and sensible way to publish our service externally. As mentioned in previous sections the ingress is an object in kubernetes that can be used to provide load balancing, SSL termination and name-based virtual hosting. We will use the previous used hackazon application to continue with our tests but now we will move from HTTP to HTTPS for delivering the content.

Dealing with Securing Ingresses in K8s

We can modify the Ingress yaml file definition to turn the ingress into a secure ingress service by enabling TLS.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: hackazon
  labels:
    app: hackazon
spec:
  tls:
  - hosts:
    - hackazon.avi.iberia.local
    secretName: hackazon-secret
  rules:
    - host: hackazon.avi.iberia.local
      http:
        paths:
        - path: /
          backend:
            serviceName: hackazon
            servicePort: 80

There are some new items if we compare with an insecure ingress definition file we discussed in the previous section. Note how the spec contains a tls field that has some attributes including the hostname and also note there is a secretName definition. The rules section are pretty much the same as in the insecure ingress yaml file.

The secretName field must point to a new type of kubernetes object called secret. A secret in kubernetes is an object that contains a small amount of sensitive data such as a password, a token, or a key. There’s a specific type of secret that is used for storing a certificate and its associated cryptographic material that are typically used for TLS . This data is primarily used with TLS termination of the Ingress resource, but may be used with other resources or directly by a workload. When using this type of Secret, the tls.key and the tls.crt key must be provided in the data (or stringData) field of the Secret configuration, although the API server doesn’t actually validate the values for each key. To create a secret we can use the kubectl create secret command. The general syntax is showed below:

kubectl create secret tls my-tls-secret \
  --cert=path/to/cert/file \
  --key=path/to/key/file

The public/private key pair must exist before hand. The public key certificate for --cert must be .PEM encoded (Base64-encoded DER format), and match the given private key for --key. The private key must be in what is commonly called PEM private key format and unencrypted. We can easily generate a private key and a cert file by using OpenSSL tools. The first step is creating the private key. I will use an Elliptic Curve with a ecparam=prime256v1. For more information about eliptic curve key criptography click here

openssl ecparam -name prime256v1 -genkey -noout -out hackazon.key

The contents of the created hackazon.key file should look like this:

-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIGXaF7F+RI4CU0MHa3MbI6fOxp1PvxhS2nxBEWW0EOzJoAoGCCqGSM49AwEHoUQDQgAE0gO2ZeHeZWBiPdOFParWH6Jk15ITH5hNzy0kC3Bn6yerTFqiPwF0aleSVXF5JAUFxJYNo3TKP4HTyEEvgZ51Q==
-----END EC PRIVATE KEY-----

In the second step we will create a Certificate Signing Request (CSR). We need to speciify the certificate paremeters we want to include in the public facing certificate. We will use a single line command to create the csr request. The CSR is the method to request a public key given an existing private key so, as you can imagine, we have to include the hackazon.key file to generate the CSR.

openssl req -new -key hackazon.key -out hackazon.csr -subj "/C=ES/ST=Madrid/L=Pozuelo/O=Iberia Lab/OU=Iberia/CN=hackazon.avi.iberia.local"

The content of the created hackazon.csr file should look like this:

-----BEGIN CERTIFICATE REQUEST-----
MIIBNjCB3AIBADB6MQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTWFkcmlkMRAwDgYDVQQHDAdQb3p1ZWxvMRMwEQYDVQQKDApJYmVyaWEgTGFiMQ8wDQYDVQQLDAZJYmVyaWExIjAgBgNVBAMMGWhhY2them9uLmF2aS5pYmVyaWEubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAATSA7Zl4d5lYGI904U9qtYfomTXkhMfmE3PLSTMLcGfrJ6tMWqI/AXRqV5JVcXkkBQXElg2jdMo/gdPIQS+BnnVoAAwCgYIKoZIzj0EAwIDSQAwRgIhAKt5AvKJ/DvxYcgUQZHK5d7lIYLYOULIWxnVPiKGNFuGAiEA3Ul99dXqon+OGoKBTujAHpOw8SA/Too1Redgd6q8wCw=
-----END CERTIFICATE REQUEST-----

Next, we need to sign the CSR. For a production environment is highly recommended to use a public Certification Authority to sign the request. For lab purposes we will self-signed the CSR using the private file created before.

openssl x509 -req -days 365 -in hackazon.csr -signkey hackazon.key -out hackazon.crt
Signature ok
subject=C = ES, ST = Madrid, L = Pozuelo, O = Iberia Lab, OU = Iberia, CN = hackazon.avi.iberia.local
Getting Private key

The output file hackazon.crt contains the new certificate encoded in PEM Base66 and it should look like this:

-----BEGIN CERTIFICATE-----
MIIB7jCCAZUCFDPolIQwTC0ZFdlOc/mkAZpqVpQqMAoGCCqGSM49BAMCMHoxCzAJBgNVBAYTAkVTMQ8wDQYDVQQIDAZNYWRyaWQxEDAOBgNVBAcMB1BvenVlbG8xEzARBgNVBAoMCkliZXJpYSBMYWIxDzANBgNVBAsMBkliZXJpYTEiMCAGA1UEAwwZaGFja2F6b24uYXZpLmliZXJpYS5sb2NhbDAeFw0yMDEyMTQxODExNTdaFw0yMTEyMTQxODExNTdaMHoxCzAJBgNVBAYTAkVTMQ8wDQYDVQQIDAZNYWRyaWQxEDAOBgNVBAcMB1BvenVlbG8xEzARBgNVBAoMCkliZXJpYSBMYWIxDzANBgNVBAsMBkliZXJpYTEiMCAGA1UEAwwZaGFja2F6b24uYXZpLmliZXJpYS5sb2NhbDBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABNIDtmXh3mVgYj3ThT2q1h+iZNeSEx+YTc8tJMwtwZ+snq0xaoj8BdGpXklVxeSQFBcSWDaN0yj+B08hBL4GedUwCgYIKoZIzj0EAwIDRwAwRAIgcLjFh0OBm4+3CYekcSG86vzv7P0Pf8Vm+y73LjPHg3sCIH4EfNZ73z28GiSQg3n80GynzxMEGG818sbZcIUphfo+
-----END CERTIFICATE-----

We can also decode the content of the X509 certificate by using the openssl tools to check if it actually match with our subject definition.

openssl x509 -in hackazon.crt -text -noout
Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number:
            33:e8:94:84:30:4c:2d:19:15:d9:4e:73:f9:a4:01:9a:6a:56:94:2a
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: C = ES, ST = Madrid, L = Pozuelo, O = Iberia Lab, OU = Iberia, CN = hackazon.avi.iberia.local
        Validity
            Not Before: Dec 14 18:11:57 2020 GMT
            Not After : Dec 14 18:11:57 2021 GMT
        Subject: C = ES, ST = Madrid, L = Pozuelo, O = Iberia Lab, OU = Iberia, CN = hackazon.avi.iberia.local
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:d2:03:b6:65:e1:de:65:60:62:3d:d3:85:3d:aa:
                    d6:1f:a2:64:d7:92:13:1f:98:4d:cf:2d:24:cc:2d:
                    c1:9f:ac:9e:ad:31:6a:88:fc:05:d1:a9:5e:49:55:
                    c5:e4:90:14:17:12:58:36:8d:d3:28:fe:07:4f:21:
                    04:be:06:79:d5
                ASN1 OID: prime256v1
                NIST CURVE: P-256
    Signature Algorithm: ecdsa-with-SHA256
         30:44:02:20:70:b8:c5:87:43:81:9b:8f:b7:09:87:a4:71:21:
         bc:ea:fc:ef:ec:fd:0f:7f:c5:66:fb:2e:f7:2e:33:c7:83:7b:
         02:20:7e:04:7c:d6:7b:df:3d:bc:1a:24:90:83:79:fc:d0:6c:
         a7:cf:13:04:18:6f:35:f2:c6:d9:70:85:29:85:fa:3e

Finally once we have the cryptographic material created, we can go ahead and create the secret object we need using regular kubectl command line. In our case we will create a new tls secret that we will call hackazon-secret using our newly created cert and private key files.

kubectl create secret tls hackazon-secret --cert hackazon.crt --key hackazon.key
secret/hackazon-secret created

I have created a simple but useful script available here that puts all this steps together. You can copy the script and customize it at your convenience. Make it executable and invoke it simply adding a friendly name, the subject and the namespace as input parameters. The script will make all the job for you.

./create-secret.sh my-site /C=ES/ST=Madrid/CN=my-site.example.com default
      
      Step 1.- EC Prime256 v1 private key generated and saved as my-site.key

      Step 2.- Certificate Signing Request created for CN=/C=ES/ST=Madrid/CN=my-site.example.com
Signature ok
subject=C = ES, ST = Madrid, CN = my-site.example.com
Getting Private key

      Step 3.- X.509 certificated created for 365 days and stored as my-site.crt

secret "my-site-secret" deleted
secret/my-site-secret created
      
      Step 4.- A TLS secret named my-site-secret has been created in current context and default namespace

Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number:
            56:3e:cc:6d:4c:d5:10:e0:99:34:66:b9:3c:86:62:ac:7e:3f:3f:63
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: C = ES, ST = Madrid, CN = my-site.example.com
        Validity
            Not Before: Dec 16 15:40:19 2020 GMT
            Not After : Dec 16 15:40:19 2021 GMT
        Subject: C = ES, ST = Madrid, CN = my-site.example.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:6d:7b:0e:3d:8a:18:af:fc:91:8e:16:7b:15:81:
                    0d:e5:68:17:80:9f:99:85:84:4d:df:bc:ae:12:9e:
                    f4:4a:de:00:85:c1:7e:69:c0:58:9a:be:90:ff:b2:
                    67:dc:37:0d:26:ae:3e:19:73:78:c2:11:11:03:e2:
                    96:61:80:c3:77
                ASN1 OID: prime256v1
                NIST CURVE: P-256
    Signature Algorithm: ecdsa-with-SHA256
         30:45:02:20:38:c9:c9:9b:bc:1e:5c:7b:ae:bd:94:17:0e:eb:
         e2:6f:eb:89:25:0b:bf:3d:c9:b3:53:c3:a7:1b:9c:3e:99:28:
         02:21:00:f5:56:b3:d3:8b:93:26:f2:d4:05:83:9d:e9:15:46:
         02:a7:67:57:3e:2a:9f:2c:be:66:50:82:bc:e8:b7:c0:b8

Once created we can see the new object using the Octant GUI as displayed below:

We can also the display the yaml defintion for that particular secret if required

Once we have the secret ready to use, let’s apply the secure ingress yaml file definition. The full yaml including the Deployment and the ClusterIP service definition can be accesed here.

kubectl apply -f hackazon_secure_ingress.yaml

As soon as the yaml file is pushed to the kubernetes API, the AKO will translate this ingress configuration into API calls to the AVI controller in order to realize the different configuration elements in external Load Balancer. That also includes the uploading of the secret k8s resource that we created before in the form of a new certificate that will be used to secure the traffic directed to this Virtual Service. This time we have changed the debugging level of AKO to DEBUG. This outputs humongous amount of information. I have selected some key messages that will help us to understand what is happening under the hood.

Exploring AKO Logs for Secure Ingress Creation

# An HTTP to HTTPS Redirection Policy has been created and attached to the parent Shared L7 Virtual service
2020-12-16T11:16:38.337Z        DEBUG   rest/dequeue_nodes.go:1213      The HTTP Policies rest_op is [{"Path":"/api/macro","Method":"POST","Obj":{"model_name":"HTTPPolicySet","data":{"cloud_config_cksum":"2197663401","created_by":"ako-S1-AZ1","http_request_policy":{"rules":[{"enable":true,"index":0,"match":{"host_hdr":{"match_criteria":"HDR_EQUALS","value":["hackazon.avi.iberia.local"]},"vs_port":{"match_criteria":"IS_IN","ports":[80]}},"name":"S1-AZ1--Shared-L7-0-0","redirect_action":{"port":443,"protocol":"HTTPS","status_code":"HTTP_REDIRECT_STATUS_CODE_302"}}]},"name":"S1-AZ1--Shared-L7-0","tenant_ref":"/api/tenant/?name=admin"}},"Tenant":"admin","PatchOp":"","Response":null,"Err":null,"Model":"HTTPPolicySet","Version":"20.1.2","ObjName":""}

# The new object is being created. The certificate and private key is uploaded to the AVI Controller. The yaml contents are parsed to create the API POST call
2020-12-16T11:16:39.238Z        DEBUG   rest/dequeue_nodes.go:1213      The HTTP Policies rest_op is [{"Path":"/api/macro","Method":"POST","Obj":{"model_name":"SSLKeyAndCertificate","data":{"certificate":{"certificate":"-----BEGIN CERTIFICATE-----\nMIIB7jCCAZUCFDPolIQwTC0ZFdlOc/mkAZpqVpQqMAoGCCqGSM49BAMCMHoxCzAJ\nBgNVBAYTAkVTMQ8wDQYDVQQIDAZNYWRyaWQxEDAOBgNVBAcMB1BvenVlbG8xEzAR\nBgNVBAoMCkliZXJpYSBMYWIxDzANBgNVBAsMBkliZXJpYTEiMCAGA1UEAwwZaGFj\na2F6b24uYXZpLmliZXJpYS5sb2NhbDAeFw0yMDEyMTQxODExNTdaFw0yMTEyMTQx\nODExNTdaMHoxCzAJBgNVBAYTAkVTMQ8wDQYDVQQIDAZNYWRyaWQxEDAOBgNVBAcM\nB1BvenVlbG8xEzARBgNVBAoMCkliZXJpYSBMYWIxDzANBgNVBAsMBkliZXJpYTEi\nMCAGA1UEAwwZaGFja2F6b24uYXZpLmliZXJpYS5sb2NhbDBZMBMGByqGSM49AgEG\nCCqGSM49AwEHA0IABNIDtmXh3mVgYj3ThT2q1h+iZNeSEx+YTc8tJMwtwZ+snq0x\naoj8BdGpXklVxeSQFBcSWDaN0yj+B08hBL4GedUwCgYIKoZIzj0EAwIDRwAwRAIg\ncLjFh0OBm4+3CYekcSG86vzv7P0Pf8Vm+y73LjPHg3sCIH4EfNZ73z28GiSQg3n8\n0GynzxMEGG818sbZcIUphfo+\n-----END CERTIFICATE-----\n"},"created_by":"ako-S1-AZ1","key":"-----BEGIN EC PRIVATE KEY-----\nMHcCAQEEIGXaF7F+RI4CU0MHa3MbI6fOxp1PvxhS2nxBEWW0EOzJoAoGCCqGSM49\nAwEHoUQDQgAE0gO2ZeHeZWBiPdOFParWH6Jk15ITH5hNzy0kzC3Bn6yerTFqiPwF\n0aleSVXF5JAUFxJYNo3TKP4HTyEEvgZ51Q==\n-----END EC PRIVATE KEY-----\n","name":"S1-AZ1--hackazon.avi.iberia.local","tenant_ref":"/api/tenant/?name=admin","type":"SSL_CERTIFICATE_TYPE_VIRTUALSERVICE"}},"Tenant":"admin","PatchOp":"","Response":null,"Err":null,"Model":"SSLKeyAndCertificate","Version":"20.1.2","ObjName":""},{"Path":"/api/macro","Method":"POST","Obj":{"model_name":"Pool","data":{"cloud_config_cksum":"1651865681","cloud_ref":"/api/cloud?name=Default-Cloud","created_by":"ako-S1-AZ1","health_monitor_refs":["/api/healthmonitor/?name=System-TCP"],"name":"S1-AZ1--default-hackazon.avi.iberia.local_-hackazon","service_metadata":"{\"namespace_ingress_name\":null,\"ingress_name\":\"hackazon\",\"namespace\":\"default\",\"hostnames\":[\"hackazon.avi.iberia.local\"],\"svc_name\":\"\",\"crd_status\":{\"type\":\"\",\"value\":\"\",\"status\":\"\"},\"pool_ratio\":0,\"passthrough_parent_ref\":\"\",\"passthrough_child_ref\":\"\"}","sni_enabled":false,"ssl_profile_ref":"","tenant_ref":"/api/tenant/?name=admin","vrf_ref":"/api/vrfcontext?name=VRF_AZ1"}},"Tenant":"admin","PatchOp":"","Response":null,"Err":null,"Model":"Pool","Version":"20.1.2","ObjName":""},{"Path":"/api/macro","Method":"POST","Obj":{"model_name":"PoolGroup","data":{"cloud_config_cksum":"2962814122","cloud_ref":"/api/cloud?name=Default-Cloud","created_by":"ako-S1-AZ1","implicit_priority_labels":false,"members":[{"pool_ref":"/api/pool?name=S1-AZ1--default-hackazon.avi.iberia.local_-hackazon","ratio":100}],"name":"S1-AZ1--default-hackazon.avi.iberia.local_-hackazon","tenant_ref":"/api/tenant/?name=admin"}},"Tenant":"admin","PatchOp":"","Response":null,"Err":null,"Model":"PoolGroup","Version":"20.1.2","ObjName":""},

# An HTTP Policy is defined to allow the requests with a Header matching the Host field hackazon.iberia.local in the / path to be swithed towards to the corresponding pool
{"Path":"/api/macro","Method":"POST","Obj":{"model_name":"HTTPPolicySet","data":{"cloud_config_cksum":"1191528635","created_by":"ako-S1-AZ1","http_request_policy":{"rules":[{"enable":true,"index":0,"match":{"host_hdr":{"match_criteria":"HDR_EQUALS","value":["hackazon.avi.iberia.local"]},"path":{"match_criteria":"BEGINS_WITH","match_str":["/"]}},"name":"S1-AZ1--default-hackazon.avi.iberia.local_-hackazon-0","switching_action":{"action":"HTTP_SWITCHING_SELECT_POOLGROUP","pool_group_ref":"/api/poolgroup/?name=S1-AZ1--default-hackazon.avi.iberia.local_-hackazon"}}]},"name":"S1-AZ1--default-hackazon.avi.iberia.local_-hackazon","tenant_ref":"/api/tenant/?name=admin"}},"Tenant":"admin","PatchOp":"","Response":null,"Err":null,"Model":"HTTPPolicySet","Version":"20.1.2","ObjName":""}]

If we take a look to the AVI GUI we can notice the new elements that has been realized to create the desired configuration.

Exploring Secure Ingress realization at AVI GUI

First of all AVI represent the secure ingress object as an independent Virtual Service. Actually AKO creates an SNI child virtual service with the name S1-AZ1–hackazon.avi.iberia.local linked to parent shared virtual service S1-AZ1-Shared-L7-0 to represent the new secure hostname. The SNI virtual service is used to bind the hostname to an sslkeycert object. The sslkeycert object is used to terminate the secure traffic on the AVI service engine. In our above example the secretName field points to the secret hackazon-secret that is asssociated with the hostname hackazon.avi.iberia.local. AKO parses the attached secret object and appropriately creates the sslkeycert object in Avi. Note that the SNI virtual service does not get created if the secret object does not exist in a form of a secret Kubernetes resource.

From Dashboard, If we click on the virtual service and then if we hover on the Virtual Service we can see some of the properties that has been attached to our secure Virtual Service object. For example note the SSL associated certicate is S1-AZ1–hackazon.avi.iberia.local, there is also a HTTP Request Policy with 1 rule that has been automically added upon ingress creation.

If we click on the pencil icon we can see how this new Virtual Service object is a Child object whose parent object corresponds to S1-AZ1–Shared-L7-0 as mentioned before.

We can also verify how the SSL Certificate attached corresponds to the new created object pushed from AKO as we show in the debugging trace before.

If we go to Templates > Security > SSL/TLS Certificates we can open the new created certificate and even click on export to explore the private key and the certificate.

If we compare the key and the certificate with our generated private key and certificate it must be identical.

AKO creates also a HTTPpolicyset rule to route the terminated traffic to the appropate pool that corresponds to the host/path specifies in the rules section of our Ingress object. If we go Policies > HTTP Request we can see a rule applied to our Virtual Service with a matching section that will find a match if the Host header HTTP header AND the path of the URL begins with “/”. If this is the case the request will be directed to the Pool Group S1-AZ1–default-hackazon.avi.iberia.local_-hackazon that contains the endpoints (pods) that has been created in our k8s deployment.

As a bonus, AKO also creates for us a useful HTTP to HTTPS redirection policy on the shared virtual service (parent to the SNI child) for this specific secure hostname to avoid any clear-text traffic flowing in the network. This produces at the client browser an automatic redirection of an originating HTTP (tcp port 80) requests to HTTPS (tcp port 443) if they are accessed on the insecure port.

Capturing traffic to disect SSL transaction

The full sequence of events trigered (excluding DNS resolution) from a client that initiates a request to the non secure service at http://hackazon.avi.iberia.local is represented in the following sequence diagram.

To see how this happen from an end user perspective just try to access the virtual service using the insecure port (TCP 80) at the URL http://hackazon.avi.iberia.local with a browser. We can see how our request is automatically redirected to the secure port (TCP 443) at https://hackazon.avi.iberia.local. The Certificate Warning appears indicating that the used certificate cannot be verified by our local browser unless we add this self-signed certificate to a local certificate store.

Unsafe Certificate Warning Message

If we proceed to the site, we can open the certificate used to encrypt the connection and you can identify all the parameters that we used to create the k8s secret object.

A capture of the traffic from the client will show how the HTTP to HTTPS redirection policy is implemented using a 302 Moved Temporarily HTTP code that will instruct our browser to redirect the request to an alternate URI located at https://hackazon.avi.iberia.local

The first packet that start the TLS Negotiation is the Client Hello. The browser uses an extension of the TLS protocol called Server Name Indication (SNI) that is commonly used and widely supported and allows the terminating device (in this case the Load Balancer) to select the appropiate certificate to secure the TLS channel and also to route the request to the desired associated virtual service. In our case the TLS negotiation uses hackazon.avi.iberia.local as SNI. This allows the AVI Service Engine to route the subsequent HTTPS requests after TLS negotiation completion to the right SNI Child Virtual Service.

If we explore the logs generated by our connection we can see the HTTPS headers that also shows the SNI Hostname (left section of image below) received from the client as well as other relevant parameters. If we capture this traffic from the customer we won’t be able to see this headers since they are encrypted inside the TLS payload. AVI is able to decode and see inside the payload because is terminating the TLS connection acting as a proxy.

As you can notice, AVI provide a very rich analytics natively, however if we need even more deeper visitility, AVI has the option to fully capture the traffic as seen by the Service Engines. We can access from Operations > Traffic Capture.

Click pencil and select virtual service, set the Size of Packets to zero to capture the full packet length and also make sure the Capture Session Key is checked. Then click Start Capture at the bottom of the window.

If we generate traffic from our browser we can see how the packet counter increases. We can stop the capture at any time just clicking on the green slider icon.

The capture is being prepared and, after a few seconds (depending on the size of the capture) the capture would be ready to download.

When done, click on the download icon at the right to download the capture file

The capture is a tar file that includes two files: a pcapng file that contains the traffic capture and a txt file that includes the key of the session and will allow us to decrypt the payload of the TLS packet. You can use the popular wireshark to open the capture. We need to specifiy the key file to wireshark prior to openeing the capture file. If using the wireshark version for MacOS simply go to Wireshark > Preferences. Then in the Preferences windows select TLS under the protocol menu and browse to select the key txt file for our capture.

Once selected, click ok and we can now open the capture pcapng file and locate one of the TLS1.2 packets in the displayed capture…

At the bottom of the screen note how the Decrypte TLS option appears

Now we can see in the bottom pane some decrypted information that in this case seems to be an HTTP 200 OK response that contains some readable headers.

An easier way to see the contents of the TLS is using the Follow > TLS Stream option. Just select one of the TLS packets and right click to show the contextual menu.

We can now see the full converation in a single window. Note how the HTTP Headers that are part of the TLS payload are now fully readable. Also note that the Body section of the HTTP packet has been encoded using gzip. This is the reason we cannot read the contents of this particular section.

If you have interest in unzipping the Body section of the packet to see its content just go to File > Export Objects > HTTP and locate the packet number of your interest. Note that now, the content type that appears is the uncompressed content type, so e.g text/html, and not gzip.

Now we have seen how to create secure Ingress K8s services using AKO in a single kubernetes cluster. It’s time to explore beyond the local cluster and moving to the next level looking for multicluster services.

AVI for K8s Part 4: Deploying AVI K8s insecure Ingress Type Services

Introducing the Ingress Object

In that section we will focus on an specific K8s resource called Ingress. The ingress is just another k8s object that manages external access to the services in a cluster, typically HTTP(S). The ingress resource exposes HTTP and HTTPS routes from outside the cluster and points to services within the cluster. Traffic routing is controlled by rules that are defined as part of the ingress specification.

An Ingress may be configured to provide k8s-deployed applications with externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name-based virtual hosting. The ingress controller (AKO in our case) is is responsible for fulfilling the Ingress with the external AVI Service Engines to help handle the traffic. An Ingress service does not expose arbitrary ports or protocols and is always related to HTTP/HTTPS traffic. Exposing services other than HTTP/HTTPS like a Database or a Syslog service to the internet tipically uses a service of type NodePort or LoadBalancer.

To create the ingress we will use a declarative yaml file instead of kubectl imperative commands this is time since is the usual way in a production environment and give us the chance to understand and modify the service definition just by changing the yaml plain text. In this case I am using Kubernetes 1.18 and this is how a typical ingress definition looks like:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: myservice
spec:
  rules:
  - host: myservice.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: myservice
          servicePort: 80

As with other kubernetes declarative file, we need apiVersion, kind and metadata to define the resource. The ingress spec will contain all the information rules needed to configure our AVI Load Balancer, in this case the protocol http, the name of the host (must be a resolvable DNS name) and the routing information such as the path and the backend that is actually terminating the traffic.

AKO needs a service of type ClusterIP (default service type) acting as backend to send the ingress requests to. In a similar way the deployment and the service k8s resources can be also defined declaratively by using a corresponding yaml file. Let’s define a deployment of an application called hackazon. Hackazon is an intentionally vulnerable machine that pretends to be an online store and that incorporates some technologies that are currently used: an AJAX interface, a realistic e-commerce workflow and even a RESTful API for a mobile application. The deployment and service definition will look like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hackazon
  labels:
    app: hackazon
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hackazon
  template:
    metadata:
      labels:
        app: hackazon
    spec:
      containers:
        - image: mutzel/all-in-one-hackazon:postinstall
          name: hackazon
          ports:
            - containerPort: 80
              name: http
---
apiVersion: v1
kind: Service
metadata:
  name: hackazon
spec:
  selector:
    app: hackazon
  ports:
  - port: 80
    targetPort: 80

As you can see above, in a single file we are describing the Deployment with several configuration elements such as the number of replicas, the container image we are deploying, the port… etc. Also at the bottom of the file you can see the Service definition that will create an abstraction called ClusterIP that will represent the set of pods under the hackazon deployment.

Once the yaml file is created we can launch the configuration by using kubectl apply command.

kubectl apply -f hackazon_deployment_service.yaml
deployment.apps/hackazon created
service/hackazon created

Now we can check the status of our services using kubectl get commands to verify what objects has been created in our cluster. Note that the Cluster IP is using an internal IP address and it’s only reachable internally.

kubectl get pods
NAME                       READY   STATUS    RESTARTS   AGE
hackazon-b94df7bdc-4d7bd   1/1     Running   0          66s
hackazon-b94df7bdc-9pcxq   1/1     Running   0          66s
hackazon-b94df7bdc-h2dm4   1/1     Running   0          66s

kubectl get services
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)       AGE
hackazon     ClusterIP   10.99.75.86   <none>        80/TCP      78s

At this point I would like to introduce, just to add some extra fun, an interesting graphical tool for kubernetes cluster management called Octant that can be easily deployed and is freely available at https://github.com/vmware-tanzu/octant. Octant can be easily installed in the OS of your choice. Before using it you need to have access to a healthy k8s cluster. You can check it by using the cluster-info command. The output should show something like this:

kubectl cluster-info                                      
Kubernetes master is running at https://10.10.24.160:6443
KubeDNS is running at https://10.10.24.160:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Using Octant as K8s dashboard

Once the above requirement is fulfilled you just need to install and execute octant using the instructions provided in the octant website. The tool is accesed via web at http://127.0.0.1:7777. You can easily check the Deployment, Pods and ReplicaSets status from Workloads > Overview

Octant dashboard showing K8s workload information in a graphical UI

And also you can verify the status of the ClusterIP service we have created from Discovery and Load Balancing > Services

Octant dashboard showing K8s services

Once Octant is deployed, let’s move to the ingress service. In that case we will use the following yaml file to declare the ingress service that will expose our application.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: hackazon
  labels:
    app: hackazon
spec:
  rules:
    - host: hackazon.avi.iberia.local
      http:
        paths:
        - path: /
          backend:
            serviceName: hackazon
            servicePort: 80

I will use the Apply YAML option at the top bar of the Octant Interface to push the configuration into the K8s API. When we press the Apply button a message confirming an Ingress service has been created appears as a top bar in the foreground screen of the UI.

Octant Ingress YAML

After applying, we can see how our new Ingress object has been created and, as you can see, our AKO integration must have worked since we have an external IP address assigned of the our frontend subnet at 10.10.25.46 which is an indication of sucessfull dialogue between AKO controller and the API endpoint of the AVI Controller.

Octant is a great tool that provides a nice representation of how the different k8s objects are related each other. If we click on our hackazon service and go to the Resource Viewer option this is the graphical view of services, replicaset, ingress, deployment, pods… etc.

Resource viewer of the hackazon service displayed from Octant UI

Now let’s move to the AKO piece. As mentioned AKO will act as an ingress controller and it should translate the resources of kind Ingress into the corresponding external Service Engine (Data Path) configuration that will cope with the traffic needs.

Exploring AKO Logs for Ingress Creation

If we look into the logs the AKO pod has producing we can notice the following relevant events has ocurred:

# A new ingress object is created. Attributes such as hostname, port, path are passed in the API Request
2020-12-14T13:19:20.316Z        INFO    nodes/validator.go:237  key: Ingress/default/hackazon, msg: host path config from ingress: {"PassthroughCollection":null,"TlsCollection":null,"IngressHostMap":{"hackazon.avi.iberia.local":[{"ServiceName":"hackazon","Path":"/","Port":80,"PortName":"","TargetPort":0}]}}

# An existing VS object called S1-AZ1--Shared-L7-0 will be used as a parent object for hosting this new Virtual Service
2020-12-14T13:19:20.316Z        INFO    nodes/dequeue_ingestion.go:321  key: Ingress/default/hackazon, msg: ShardVSPrefix: S1-AZ1--Shared-L7-
2020-12-14T13:19:20.316Z        INFO    nodes/dequeue_ingestion.go:337  key: Ingress/default/hackazon, msg: ShardVSName: S1-AZ1--Shared-L7-0

# A new server Pool will be created 
2020-12-14T13:19:20.316Z        INFO    nodes/avi_model_l7_hostname_shard.go:37 key: Ingress/default/hackazon, msg: Building the L7 pools for namespace: default, hostname: hackazon.avi.iberia.local
2020-12-14T13:19:20.316Z        INFO    nodes/avi_model_l7_hostname_shard.go:47 key: Ingress/default/hackazon, msg: The pathsvc mapping: [{hackazon / 80 100  0}]
2020-12-14T13:19:20.316Z        INFO    nodes/avi_model_l4_translator.go:245    key: Ingress/default/hackazon, msg: found port match for port 80

# The pool is populated with the endpoints (Pods) that will act as pool members for that pool. 
2020-12-14T13:19:20.316Z        INFO    nodes/avi_model_l4_translator.go:263    key: Ingress/default/hackazon, msg: servers for port: 80, are: [{"Ip":{"addr":"10.34.1.5","type":"V4"},"ServerNode":"site1-az1-k8s-worker02"},{"Ip":{"addr":"10.34.1.6","type":"V4"},"ServerNode":"site1-az1-k8s-worker02"},{"Ip":{"addr":"10.34.2.6","type":"V4"},"ServerNode":"site1-az1-k8s-worker01"}]
2020-12-14T13:19:20.317Z        INFO    objects/avigraph.go:42  Saving Model :admin/S1-AZ1--Shared-L7-0


# The IP address 10.10.25.46 has been allocated for the k8s ingress object
2020-12-14T13:19:21.162Z        INFO    status/ing_status.go:133        key: admin/S1-AZ1--Shared-L7-0, msg: Successfully updated the ingress status of ingress: default/hackazon old: [] new: [{IP:10.10.25.46 Hostname:hackazon.avi.iberia.local}]


Exploring Ingress realization at AVI GUI

Now we can explore the AVI Controller to see how this API calls from the AKO are being reflected on the GUI.

For insecure ingress objects, AKO uses a sharding scheme, that means some configuration will be shared across a single object aiming to save public IP addressing space. The configuration objects that are created in SE are listed here:

  • A Shared parent Virtual Service object is created. The name is derived from –Shared-L7-. In this case cluster name is set in the values.yaml file and corresponds to S1-AZ1 and the allocated ID is 0.
    • A Pool Group Object that contains a single Pool Member. The Pool Group Name is derived also from the cluster name <cluster_name>–hostname
    • A priority label that is associated with the Pool Group with the name host/path. In this case hackazon.avi.iberia.local/
    • An associated DataScript object to interpret the host/path combination of the incoming request and the pool will be chosen based on the priority label

You can check the DataScript automatically created in Templates > Scripts > DataScript. The content is showed bellow. Basically it extracts the host and the path from the incoming http request and selects the corresponding pool group.

host = avi.http.get_host_tokens(1)
path = avi.http.get_path_tokens(1)
if host and path then
lbl = host.."/"..path
else
lbl = host.."/"
end
avi.poolgroup.select("S1-AZ1--Shared-L7-0", string.lower(lbl) )

By the way, note that the Shared Virtual object is displayed in yellow. The reason behind that color is because this is a composite health status obtained from several factors. If we hover the mouse over the Virtual Service object we can see two factors that are influencing this score of 72 and the yellow color. In that case there is a 20 points penalty due to the fact this is an insecure virtual service and also a decrement of 5 related to resource penalty associated with the fact that this is an very young service (just created). This metrics are used by the system to determine the optimal path of the traffic in case there are different options to choose.

Let’s create a new ingress using the following YAML file. This time we will use the kuard application. The content of the yaml file that defines the Deployment, Service and Ingress objects is showed below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kuard
  labels:
    app: kuard
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kuard
  template:
    metadata:
      labels:
        app: kuard
    spec:
      containers:
        - image: gcr.io/kuar-demo/kuard-amd64:1
          name: kuard
          ports:
            - containerPort: 8080
              name: http
---
apiVersion: v1
kind: Service
metadata:
  name: kuard
spec:
  selector:
    app: kuard
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: kuard
  labels:
    app: kuard
spec:
  rules:
    - host: kuard.avi.iberia.local
      http:
        paths:
        - path: /
          backend:
            serviceName: kuard
            servicePort: 80

Once applied using the kubectl -f apply command we can see how a new Pool has been created under the same shared Virtual Service object

As you can see the two objects are sharing the same IP address. This is very useful to save public IP addresses. The DataScript will be in charge of routing the incoming requests to the right place.

Last verification. Let’s try to resolve the hostnames using the integrated DNS in AVI. Note how both querys resolves to the same IP address since we are sharing the Virtual Service object. There are other options so share the parent VS among the different ingress services. The default option is using hostname but you can define a sharding scheme based on the namespace as well.

dig hackazon.avi.iberia.local @10.10.25.40 +noall +answer
  hackazon.avi.iberia.local. 5    IN      A       10.10.25.46

dig kuard.avi.iberia.local @10.10.25.40 +noall +answer
  kuard.avi.iberia.local. 5    IN      A       10.10.25.46

The final step is to open a browser and check if our applications are actually working. If we point our browser to the FQDN at http://hackazon.avi.iberia.local we can see how the web application is launched.

We can do the same for the other application by pointing at http://kuard.avi.iberia.local

Note that the browsing activity for both applications that share the same Virtual Service construct will appear under the same Analytics related to the S1-AZ1–Shared-L7-0 parent VS object.

If we need to focus on just one of the applications we can filter using, for example, Host Header attribute the Log Analytics ribbon located at the right of the Virtual Services > S1-AZ1–Shared-L7-0 > Logs screen.

If we click on the hackazon.avi.iberia.local Host we can see all hackazon site related logs

That’s all for now for the insecure objects. Let’s move into the next section to explore the secure ingress services.

AVI for K8s Part 3: Deploying AVI K8s LoadBalancer Type Services

Creating first LoadBalancer Object

Once the reachability is done now it’s time to create our first kubernetes Services. AKO is a Kubernetes operator implemented as a Pod that will be watching the kubernetes service objects of type Load Balancer and Ingress and it will configure them accordingly in the Service Engine to serve the traffic. Let’s focus on the LoadBalancer service for now. A LoadBalancer is a common way in kubernetes to expose L4 services (non-http) to the external world.

Let’s create the first service using some kubectl imperative commands. First we will create a simple deployment using kuard which is a popular app used for testing and will use a container image as per the kubectl command below. After creating the deployment we can see k8s starting the pod creation process.

kubectl create deployment kuard --image=gcr.io/kuar-demo/kuard-amd64:1
deployment.apps/kuard created

kubectl get pods
 NAME                    READY   STATUS       RESTARTS   AGE
 kuard-74684b58b8-hmxrs    1/1   Running      0          3s

As you can see the scheduler has decided to place the new created pod in the worker node site1-az1-k8s-worker02 and the IP 10.34.1.8 has been allocated.

kubectl describe pod kuard-74684b58b8-hmxrs
 Name:         kuard-74684b58b8-hmxrs
 Namespace:    default
 Priority:     0
 Node:         site1-az1-k8s-worker02/10.10.24.162
 Start Time:   Thu, 03 Dec 2020 17:48:01 +0100
 Labels:       app=kuard
               pod-template-hash=74684b58b8
 Annotations:  <none>
 Status:       Running
 IP:           10.34.1.8
 IPs:
  IP:           10.34.1.8

Remember this network is not routable from the outside unless we create a static route pointing to the node IP address as next-hop. This configuration task is done for us automatically by AKO as explained in previous article. If we want to expose externally our kuard deployment, we would create a LoadBalancer service. As usual, will use kubectl imperative commands to do so. In that case kuard listen on port 8080.

kubectl expose deployment kuard --port=8080 --type=LoadBalancer

Let’s try to see what is happening under the hood debugging the AKO pod. The following events has been triggered by AKO as soon as we create the new LoadBalancer service. We can show them using kubectl logs ako-0 -n avi-system

kubectl logs ako-0 -n avi-system
# AKO detects the new k8s object an triggers the VS creation
2020-12-11T09:44:23.847Z        INFO    nodes/dequeue_ingestion.go:135  key: L4LBService/default/kuard, msg: service is of type loadbalancer. Will create dedicated VS nodes

# A set of attributes and configurations will be used for VS creation 
# including Network Profile, ServiceEngineGroup, Name of the service ... 
# naming will be derived from the cluster name set in values.yaml file
2020-12-11T09:44:23.847Z        INFO    nodes/avi_model_l4_translator.go:97     key: L4LBService/default/kuard, msg: created vs object: {"Name":"S1-AZ1--default-kuard","Tenant":"admin","ServiceEngineGroup":"S1-AZ1-SE-Group","ApplicationProfile":"System-L4-Application","NetworkProfile":"System-TCP-Proxy","PortProto":[{"PortMap":null,"Port":8080,"Protocol":"TCP","Hosts":null,"Secret":"","Passthrough":false,"Redirect":false,"EnableSSL":false,"Name":""}],"DefaultPool":"","EastWest":false,"CloudConfigCksum":0,"DefaultPoolGroup":"","HTTPChecksum":0,"SNIParent":false,"PoolGroupRefs":null,"PoolRefs":null,"TCPPoolGroupRefs":null,"HTTPDSrefs":null,"SniNodes":null,"PassthroughChildNodes":null,"SharedVS":false,"CACertRefs":null,"SSLKeyCertRefs":null,"HttpPolicyRefs":null,"VSVIPRefs":[{"Name":"S1-AZ1--default-kuard","Tenant":"admin","CloudConfigCksum":0,"FQDNs":["kuard.default.avi.iberia.local"],"EastWest":false,"VrfContext":"VRF_AZ1","SecurePassthoughNode":null,"InsecurePassthroughNode":null}],"L4PolicyRefs":null,"VHParentName":"","VHDomainNames":null,"TLSType":"","IsSNIChild":false,"ServiceMetadata":{"namespace_ingress_name":null,"ingress_name":"","namespace":"default","hostnames":["kuard.default.avi.iberia.local"],"svc_name":"kuard","crd_status":{"type":"","value":"","status":""},"pool_ratio":0,"passthrough_parent_ref":"","passthrough_child_ref":""},"VrfContext":"VRF_AZ1","WafPolicyRef":"","AppProfileRef":"","HttpPolicySetRefs":null,"SSLKeyCertAviRef":""}


# A new pool is created using the existing endpoints in K8s that represent # the deployment
2020-12-11T09:44:23.848Z        INFO    nodes/avi_model_l4_translator.go:124    key: L4LBService/default/kuard, msg: evaluated L4 pool values :{"Name":"S1-AZ1--default-kuard--8080","Tenant":"admin","CloudConfigCksum":0,"Port":8080,
"TargetPort":0,"PortName":"","Servers":[{"Ip":{"addr":"10.34.1.8","type":"V4"},"ServerNode":"site1-az1-k8s-worker02"}],"Protocol":"TCP","LbAlgorithm":"","LbAlgorithmHash":"","LbAlgoHostHeader":"","IngressName":"","PriorityLabel":"","ServiceMetadata":{"namespace_ingress_name":null,"ingress_name":"","namespace":"","hostnames":null,"svc_name":"","crd_status":{"type":"","value":"","status":""},"pool_ratio":0,"passthrough_parent_ref":"","passthrough_child_ref":""},"SniEnabled":false,"SslProfileRef":"","PkiProfile":null,"VrfContext":"VRF_AZ1"}

If we move to the Controller GUI we can notice how a new Virtual Service has been automatically provisioned

The reason of the red color is because the virtual service needs a Service Engine to perform its function in the DataPlane. If you hover the mouse over the Virtual Service object a notification is showed confirming that it is waiting to the SE to be deployed.

VS State whilst Service Engine is being provisioned

The AVI controller will ask the infrastructure cloud provider (vCenter in this case) to create this virtual machine automatically.

SE automatic creation in vSphere infrastructure

After a couple of minutes, the new Service Engine that belongs to our Service Engine Group is ready and has been plugged automatically in the required networks. In our example, because we are using a two-arm deployment, the SE would need a vnic interface to reach the backend network and also a fronted vnic interface to answer external ARP requests coming from the clients. Remember IPAM is one of the integrated services that AVI provides so the Controller will allocate all the needed IP addresses automatically on our behalf.

After some minutes, the VS turns intro green. We can expand the new VS to visualize the related object such as the VS, the server pool, the backend network and the k8s endpoints (pods) that will be used as members of the server pool. Also we can see the name of the SE in which the VS is currently running.

As you probably know, a deployment resource has an associated replicaset controller that is used, as its name implies, to control the number of individual replicas for a particular deployment. We can use kubectl commands to scale in/out our deployment just by changing the number or replicas. As you can guess our AKO needs to be aware of any changes in the deployment so this change should be reflected accordingly in the AVI Virtual Server realization at the Service Engines. Let’s scale-out our deployment.

kubectl scale deployment/kuard --replicas=5
 deployment.apps/kuard scaled

This will create new pods that will act as endpoints for the same service. The new set of endpoints created become members of the server pool as part of the AVI Virtual Service object as it is showed below in the graphical representation

Virtual Service of a LoadBalancer type application scaling out

AVI as DNS Resolver for created objects

The DNS is another integrated service that AVI performs so, once the Virtual Service is ready it should register the name against the AVI DNS. If we go to Applications > Virtual Service > local-dns-site1 in the DNS Records tab we can see the new DNS record added automatically.

If we query the DNS asking for kuard.default.avi.iberia.local

dig kuard.default.avi.iberia.local @10.10.25.44 +noall +answer
 kuard.default.avi.iberia.local. 5 IN    A       10.10.25.43

In the same way, if we scale-in the deployment to zero replicas using the same method described above, it should have also an effect in the Virtual Service. We can see how it turns again into red and how the pool has no members inasmuch as no k8s endpoints are available.

Virtual Service representation when replicaset = 0

And hence if we query for the FQDN, we should receive a NXDOMAIN answer indicating that the server is unable to resolve that name. Note how a SOA response indicates that the DNS server you are querying is authoritative for this particular domain though.

 dig nginx.default.avi.iberia.local @10.10.25.44
 ; <<>> DiG 9.16.1-Ubuntu <<>> kuard.default.avi.iberia.local @10.10.25.44
 ;; global options: +cmd
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 59955
 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
 ;; WARNING: recursion requested but not available
 ;; OPT PSEUDOSECTION:
 ; EDNS: version: 0, flags:; udp: 512
 ;; QUESTION SECTION:
 ;nginx.default.avi.iberia.local.        IN      A
 ;; AUTHORITY SECTION:
 kuard.default.avi.iberia.local. 30 IN   SOA     site1-dns.iberia.local. [email protected]. 1 10800 3600 86400 30
 ;; Query time: 0 msec
 ;; SERVER: 10.10.25.44#53(10.10.25.44)
 ;; WHEN: Tue Nov 17 14:43:35 CET 2020
 ;; MSG SIZE  rcvd: 138

Let’s scale out again our deployment to have 2 replicas.

kubectl scale deployment/kuard --replicas=2
 deployment.apps/kuard scale

Last, let’s verify if the L4 Load Balancing service is actually working. We can try to open the url in our preferred browser. Take into account your configured DNS should be able to forward DNS queries for default.avi.iberia.local DNS zone to success in name resolution. This can be achieved easily by configuring a Zone Delegation in your existing local DNS.

Exploring AVI Analytics

One of the most interesting features of using AVI as a LoadBalancer is the rich analytics the product provides. A simple way to generate synthetic traffic is using the locust tool written in python. You need python and pip3 to get locust running. You can find instructions about locust installation here. We can create a simple file to mimic user activity. In this case let’s simulate users browsing the “/” path. The contents of the locustfiel_kuard.py would be something like this.

import random
 from locust import HttpUser, between, task
 from locust.contrib.fasthttp import FastHttpUser
 import resource
 resource.setrlimit(resource.RLIMIT_NOFILE, (9999, 9999))
 class QuickstartUser(HttpUser):
    wait_time = between(5, 9)
 @task(1)
    def index_page(self):
       self.client.get("/")

We can now launch the locust app using bellow command line. This generate traffic for 100 seconds sending GET / requests to the URL http://10.10.25.43:8080. The tool will show some traffic statistics in the stdout.

locust -f locustfile_kuard.py --headless --logfile /var/local/locust.log -r 100 -u 200 --run-time 100m --host=http://10.10.25:43:8080

In order to see the user activity logs we need to enable the Non-significant logs under the properties of the created S1-AZ1–default-kuard Virtual Service. You need also to set the Metric Update Frequency to Real Time Metrics and set to 0 mins to speed up the process of getting activity logs into the GUI.

Analytics Settings for the VS

After this, we can enjoy the powerful analytics provided by AVI SE.

Logs Analytics for L4 Load Balancer Virtual Service

For example, we can diagnose easily some common issues like retransmissions or timeouts for certain connections or some deviations in the end to end latency.

We can also see how the traffic is being distributed accross the different PODs. We can go to Log Analytics at the right of the screen and then if we click on Server IP Address, you get this traffic table showing traffic distribution.

Using Analytics to get traffic distribution accross PODs

And also how the traffic is evolving across the time.

Analytics dashboard

Now that we have a clear picture of how AKO works for LoadBalancer type services, let’s move to the next level to explore the ingress type services.

AVI for K8s Part 2: Installing AVI Kubernetes Operator

AVI Ingress Solution Elements

After setting up the AVI configuration now it’s time to move into the AVI Kubernetes Operator. AKO will communicate with AVI Controller using AKO and will realize for us the LoadBalancer and Ingress type services translating the desired state for this K8s services into AVI Virtual Services that will run in the external Service Engines. The AKO deployment consists of the following components:

  • The AVI Controller
    • Manages Lifecycle of Service Engines
    • Provides centralized Analytics
  • The Service Engines (SE)
    • Host the Virtual Services for K8s Ingress and LoadBalancer
    • Handles Virtual Services Data Plane
  • The Avi Kubernetes Operator
    • Provides Ingress-Controller capability within the K8s Cluster
    • Monitor ingress and loadbalancer K8s objects and translates into AVI configuration via API
    • Runs as a Pod in the K8S cluster

The following figure represent the network diagram for the different elements that made the AKO integration in the Site1 AZ1

Detailed network topology for Site1 AZ1

Similarly the below diagram represent the Availability Zone 2. As you can notice the AVI Controller (Control/Management Plane) is shared between both AZs in the same Site whereas the Date Plane (i.e Service Engines) remains separated in different VMs and isolated from a network perspective.

Detailed network topology for Site1 AZ2

I am using here a vanilla Kubernetes based on 1.18 release. Each cluster is made up by a single master and two worker nodes and we will use Antrea as CNI. Antrea is a cool kubernetes networking solution intended to be Kubernetes native. It operates at Layer3/4 to provide networking and security services for a Kubernetes cluster. You can find more information of Antrea and how to install it here. To install Antrea you need to assign a CIDR block to provide IP Addresses to the PODs. In my case I have selected two CIDR blocks as per the table below:

Cluster NamePOD CIDR BlockCNI# Master# Workers
site1-az110.34.0.0/18Antrea12
site1-az210.34.64.0/18Antrea12
Kubernetes Cluster CIDR block for POD networking

Before starting, the cluster must be in a Ready status. We can check the current status of our k8s cluster using kubectl commands. To be able to operate a kubernetes cluster using kubectl command line you need a kubeconfig file that contains the authentication credentials needed to gain access via API to the desired cluster. An easy way to gain access is jumping into the Master node and assuming a proper kubeconfig file is at $HOME/.kube/config, you can check the status of your kubernetes cluster nodes at Site1 AZ1 using kubectl as shown below.

kubectl get nodes
 NAME                     STATUS   ROLES    AGE   VERSION
 site1-az1-k8s-master01   Ready    master   29d   v1.18.10
 site1-az1-k8s-worker01   Ready    <none>   29d   v1.18.10
 site1-az1-k8s-worker02   Ready    <none>   29d   v1.18.10

In a similar way you can ssh to the master node at Site1 AZ2 cluster and check the status of that particular cluster.

kubectl get nodes
 NAME                     STATUS   ROLES    AGE   VERSION
 site1-az2-k8s-master01   Ready    master   29d   v1.18.10
 site1-az2-k8s-worker01   Ready    <none>   29d   v1.18.10
 site1-az2-k8s-worker02   Ready    <none>   29d   v1.18.10

Understanding pod reachability

As mentioned the Virtual Service hosted in the Service Engines will act as the frontend for exposing our K8s external services. On the other hand, we need to ensure that the Service Engines reach the pod networks to complete the Data Path. Generally the pod network is a non-routable network used internally to provide pod-to-pod connectivity and therefore is not reachable from the outside. As you can imagine, we have to find the way to allow external traffic to come in to accomplish the Load Balancing function.

One common way to do this is to use a k8s feature called NodePorts. NodePort exposes the service on each Node’s IP at a static port and you can connect to the NodePort service outside the cluster by requesting <NodeIP>:<NodePort>. This is a fixed port to a service and it is in the range of 30000–32767. With this feature you can contact any of the workers in the cluster using the allocated port in order to reach the desired deployment (application) behind that exposed service. Note that you use NodePort without knowing where (i.e. in which worker node) the Pods for that service are actually running.

Having in mind how a NodePort works, now let’s try to figure out how our AVI External Load Balance would work in an environment in which we use NodePort to expose our applications. Imagine a deployment like the one represented in the below picture. As you can see there are two sample deployments: hackazon and kuard. The hackazon one has just one pod replica whereas the kuard deployment has two replicas. The k8s scheduler service has decided to place the pods as represented in the figure. On the top of the diagram you can see how our external Service Engine would expose the corresponding virtual services in the FrontEnd Network and creates a Server Pool made up by each of the NodePort services, in that case, for the hackazon.avi.iberia.local virtual service a three member server pool would be created distributing traffic to 10.10.24.161:32222, 10.10.24.162:32222 and 10.10.24.163:32222. As you can see the traffic would be distributed evenly across the pool regardless the actual Pod is running at Worker 01. On the other hand since the NodePort is just an abstraction of the actual Deployment, as long as one Pod is up and running the NodePort would appear to be up from a health-check perspective. The same would happen with the kuard.avi.iberia.local virtual service.

As you can see, the previous approach cannot take into account how the actual PODs behind this exposed service are distributed across the k8s cluster and can lead into inefficient east-west traffic among K8s worker nodes and also, since we are exposing a service and not the actual endpoint (the POD) we cannot take advantage of some interesting features such as POD health-monitoring or what sometimes is a requirement: server persistence.

Although NodePort based node-reachability is still an option. The AKO integration proposes another much better integration that overcomes previous limitations. Since the worker nodes are able to forward IPv4 packets and because the CNI knows the IP Addressing range assigned to every K8s node we can predict the full range of IP Addresses the POD will take once created.

You can check the CIDR block that Antrea CNI solution has allocated to each of the Nodes in the cluster using kubectl describe

kubectl describe node site1-az1-k8s-worker01
 Name:               site1-az1-k8s-worker01
 Roles:              
 Labels:             beta.kubernetes.io/arch=amd64
                     beta.kubernetes.io/os=linux
                     kubernetes.io/arch=amd64
                     kubernetes.io/hostname=site1-az1-k8s-worker01
                     kubernetes.io/os=linux

 Addresses:
   InternalIP:  10.10.24.161
   Hostname:    site1-az1-k8s-worker01
< ... skipped output ... >
 PodCIDR:                      10.34.2.0/24
 PodCIDRs:                     10.34.2.0/24
< ... skipped output ... >

Another fancy way to get this info is by using json format. Using jq tool you can parse the output and get the info you need using a single-line command like this:

kubectl get nodes -o json | jq '[.items[] | {name: .metadata.name, podCIDRS: .spec.podCIDR, NodeIP: .status.addresses[0].address}]'
 [
   {
     "name": "site1-az1-k8s-master01",
     "podCIDRS": "10.34.0.0/24",
     "NodeIP": "10.10.24.160"
   },
   {
     "name": "site1-az1-k8s-worker01",
     "podCIDRS": "10.34.2.0/24",
     "NodeIP": "10.10.24.161"
   },
   {
     "name": "site1-az1-k8s-worker02",
     "podCIDRS": "10.34.1.0/24",
     "NodeIP": "10.10.24.162"
   }
 ]

To sum up, in order to achieve IP reachability to the podCIDR network the idea is to create a set of static routes using the NodeIP as next-hop to reach the assigned PodCIDR for every individual kubernetes node. Something like a route to 10.34.2.0/24 pointing to the next-hop 10.10.24.161 to reach PODs at site1-az1-k8s-worker01 and so on. Of course one of the AKO functions is to achieve this in a programatic way so this will be one of their first actions the AKO operator will perform at bootup.

AVI Kubernetes Operator (AKO) Installation

AKO will run as a pod on a dedicated namespace that we will create called avi-system. Currently the AKO is packaged as a Helm chart. Helm uses a packaging format for creating kubernetes objects called charts. A chart is a collection of files that describe a related set of Kubernetes resources. We need to install helm prior to deploy AKO.

There are different methods to install Helm. Since I am using ubuntu here I will use the snap package manager method which is the easiest.

sudo snap install helm --classic
 helm 3.4.1 from Snapcrafters installed

The next step is add the AVI AKO repository that include the AKO helm chart using into our local helm.

helm repo add ako https://projects.registry.vmware.com/chartrepo/ako "ako" has been added to your repositories

Now we can search the available helm charts at the repository just added before as shown below.

helm search repo
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
ako/ako                 1.4.2           1.4.2           A helm chart for AKO
ako/ako-operator        1.3.1           1.3.1           A Helm chart AKOO
ako/amko                1.4.1           1.4.1           A helm chart for AMKO

Next step is to create a new k8s namespace named avi-system in which we will place the AKO Pod.

kubectl create namespace avi-system
namespace/avi-system created

We have to pass some configuration to the AKO Pod. This is done by means of a values.yaml file in which we need to populate the corresponding configuration parameters that will allow AKO to communicate with AVI Controller among other things. The full list of values and description can be found here. You can get a default values.yaml file using following commands:

helm show values ako/ako --version 1.4.2 > values.yaml

Now open the values.yaml file and change the values as showed in below table to match with our particular environment in Site 1 AZ1 k8s cluster. You can find my values.yaml file I am using here just for reference.

ParameterValueDescription
AKOSettings.disableStaticRouteSyncfalseAllow the AKO to create static routes to achieve
POD network connectivity
AKOSettings.clusterNameS1-AZ1A descriptive name for the cluster. Controller will use
this value to prefix related Virtual Service objects
NetworkSettings.subnetIP10.10.25.0Network in which create the Virtual Service Objects at AVI SE. Must be in the same VRF as the backend network used to reach k8s nodes. It must be configured with a static pool or DHCP to allocate IP address automatically.
NetworkSettings.subnetPrefix24Mask lenght associated to the subnetIP for Virtual Service Objects at SE.
NetworkSettings.vipNetworkList:
networkName
AVI_FRONTEND_3025Name of the AVI Network object hat will be used to place the Virtual Service objects at AVI SE.
L4Settings.defaultDomainavi.iberia.localThis domain will be used to place the LoadBalancer service types in the AVI SEs.
ControllerSettings.serviceEngineGroupNameS1-AZ1-SE-GroupName of the Service Engine Group that AVI Controller use to spin up the Service Engines
ControllerSettings.controllerVersion20.1.2Controller API version
ControllerSettings.controllerIP10.10.20.43IP Address of the AVI Controller
avicredentials.usernameadminUsername to get access to the AVI Controller
avicredentials.passwordpassword01Password to get access to the AVI Controller
values.yaml for AKO

Save the values.yaml in a local file and next step is to install the AKO component through helm. Add the version and the values.yaml as input parameters. We can do it that way:

helm install ako/ako --generate-name --version 1.4.2 -f values.yaml -n avi-system
 NAME: ako-1605611539
 LAST DEPLOYED: Tue Jun 06 12:12:20 2021
 NAMESPACE: avi-system
 STATUS: deployed
 REVISION: 1

We can list the deployed chart using helm CLI list command within the avi-system namespace

 helm list -n avi-system
 NAME              NAMESPACE   REVISION  STATUS      CHART       APP
 ako-1605611539    avi-system  1         deployed    ako-1.4.2   1.4.2

This chart will create all the k8s resources needed by AKO to perform its functions. The main resource is the pod. We can check the status of the AKO pod using kubectl commands.

kubectl get pods -n avi-system
NAME    READY   STATUS    RESTARTS   AGE
ako-0   1/1     Running   0          5m45s

In case we experience problems (e.g Status is stuck in ContainerCreating or Restars shows a large number of restarts) we can always use standard kubectl commands such as kubectl logs or kubectl describe pod for troubleshooting and debugging.

If we need to update the values.yaml we must delete and recreate the ako resources by means of helm. I have created a simple restart script that can be found here named ako-reload.sh that lists the existing ako helm deployed release, deletes it and recreates using the values.yaml file in the current directory. This is helpful to save some time and also to stay up to date with the last application version because it will update the AKO and choose the most recent version of ako component in the AKO repository. The values.yaml file must be in the same path to make it works.

#!/bin/bash
# Update helm repo f AKO version
helm repo add ako https://projects.registry.vmware.com/chartrepo/ako

helm repo update
# Get newest AKO APP Version
appVersion=$(helm search repo | grep ako/ako | grep -v operator | awk '{print $3}')

# Get Release number of current deployed chart
akoRelease=$(helm list -n avi-system | grep ako | awk '{print $1}')

# Delete existing helm release and install a new one
helm delete $akoRelease -n avi-system
helm install ako/ako --generate-name --version $appVersion -f values.yaml --namespace avi-system

Make the script executable and simply run it each time you want to refresh the AKO installation. If this is not the first time we execute the script note how the first message warn us that the repo we are adding was already added, just ignore it.

chmod +x ako_reload.sh
"ako" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "ako" chart repository
Update Complete. ⎈Happy Helming!⎈
release "ako-1622738990" uninstalled
NAME: ako-1623094629
LAST DEPLOYED: Mon Jun  7 19:37:11 2021
NAMESPACE: avi-system
STATUS: deployed
REVISION: 1

To verify that everything is running properly and that the communication with AVI controller has been successfully established we can check if the static routes in the VRF has been populated to attain required pod reachability as mentioned before. It is interesting to debug the AKO application using standard kubectl logs in order to see how the different events and API calls occur.

For example, we can see how in the first step AKO discovers the AVI Controller infrastructure and the type of cloud integration (vCenter). It also discovers VRF in which it has to create the routes to achieve Pod reachability. In this case the VRF is inferred from the properties of the selected AVI_FRONTEND_3025 network (remember this is the parameter NetworkSettings.VipNetworkList we have used in our values.yaml configuration file) at AVI Controller and correspondes to VRF_AZ1 as shown below:

kubectl logs -f ako-0 -n avi-system
INFO    cache/controller_obj_cache.go:2558      
Setting cloud vType: CLOUD_VCENTER
INFO   cache/controller_obj_cache.go:2686
Setting VRF VRF_AZ1 found from network AVI_FRONTEND_3025

A little bit down we can see how the AKO will create the static routes in the AVI Controller to obtain POD reachability in that way.

INFO   nodes/avi_vrf_translator.go:64  key: Node/site1-az1-k8s-worker02, Added vrf node VRF_AZ1
INFO   nodes/avi_vrf_translator.go:65  key: Node/site1-az1-k8s-worker02, Number of static routes 3

As you can guess, now the AVI GUI should reflect this configuration. If we go to Infrastructure > Routing > Static Routes we should see how three new routes has been created in the desired VRF to direct traffic towards the PodCIDR networks allocated to each node by the CNI. The backend IP address will be used as next-hop.

We will complete the AKO configuration for the second k8s cluster at a later stage since we will be focused on a single cluster for now. Once the reachability has been done, now it’s time to move into next level and start creating the k8s resources.

AVI for K8S Part 1: Preparing AVI Infrastructure

The very first step for start using NSX Advanced Load Balancer (a.k.a AVI Networks) is to prepare the infrastructure. The envisaged topology is represented in the figure below. I will simulate two K8S cluster environment that might represent two availability zones (AZ) in the same site. Strictly speaking an availability zone must be a unique physical location within a region equipped with independent power, cooling, and networking. For the sake of simplicity we will simulate that condition over a single vCenter Datacenter and under the very same physical infrastructure. I will focus in a single-region multi-AZ scenario that will evolve to a multi-region in subsequents part of this blog series.

Multi-AvailabilityZone Arquitecture for AVI AKO

AVI proposes a very modern Load Balancing architecture in which the Control plane (AVI Controller) is separated from the Data Plane (AVI Service Engines). The Data Plane is created on demand as you create Virtual Services. To spin up the Service Engines in an automated fashion, AVI Controllers uses a Cloud Provider entity that will provide the compute requirements to bring the Data Plane up. This architectural model in which the brain is centralized embraces very well VMware’s Virtual Cloud Network strategy around Modern Network Solutions: “any app , any platform, any device” that aims to extend universally the network services (load balancing in this case) to virtually anywhere regardless where our application exists and what cloud provider we are using.

Step 1: AVI Controller Installation

AVI Controller installation is quite straightforward. If you are using vSphere you just need the OVA file to install the Controller VM deploying it from vCenter client. Deploy a new VM deploying a OVF in the desired infrastructure.

AVI OVF Deployment

Complete all the steps with your particular requirements such as Cluster, Folder, Storage, Networks… etc. In the final step there is a Customize template to create some base configuration of the virtual machine. The minimum requirements for the AVI controllers are 8vCPU, 24 GB vRAM and 128 GB vHDD.

AVI OVF Customization Template

When the deployment is ready power on the created Virtual Machine and wait some minutes till the boot process completes, then connect to the Web interface at https://<AVI_ip_address> using the Management Interface IP Address you selected above.

AVI Controller setup wizard 1

Add the Network information, DNS, NTP… etc as per your local configuration.

AVI Controller setup wizard 2

Next you will get to the Orchestrator Integration page. We are using here VMware vSphere so click the arrow in the VMware tile to proceed with vCenter integration.

AVI Controller setup wizard 3

Populate the username, password and fqdn of the vcenter server

AVI Controller setup wizard 4

Select write mode and left the rest of the configuration with the default values.

AVI Controller setup wizard 5

Select the Management Network that will be used for Service Engine to establish connectivity with the AVI Controller. If using Static you need to define Subnet, Address Pool and the Default Gateway.

AVI Controller setup wizard 6

The final step asks if we want to support multiple Tenants. We will use a single tenant model so far. The name of the tenant will be admin.

AVI Controller setup wizard 7

Once the initial wizard is complete we should be able to get into the GUI and go to Infrastructure > Clouds and click on + symbol at the right of the Default-Cloud (this is the default name assigned to our vCenter integration). You should be able to see a green status icon showing the integration has suceeded as weel as the configuration paramenters.

Now that the AVI Controller has been installed and the base cloud integration is already done, let’s complete the required steps to get our configuration done. These are the steps needed to complete the configuration. Note: At the time of writing this article the AKO integration is supported on vCenter full-access and the only supported networks for Service Engine placements are PortGroup (VLAN-backed) based. Check regularly the Release Notes here.

Step 2: IPAM and DNS

AVI is a Swiss Army knife solution that can provide not only load-balancing capabilities but also can cover other important peripheral services such as IPAM and DNS. The IPAM is needed to assign IP addressing automatically when a new Virtual Service is created and the DNS module will register the configured Virtual Service FQDN in an internal DNS service that can be queried allowing server name resolution. We need to attach an IPAM and DNS profile to the Cloud vCenter integration in order to activate those services.

From the AVI GUI we go to Templates > IPAM/DNS Profiles > CREATE > DNS Profile and name it DNS_Default for example.

I will use avi.iberia.local as my default domain. Another important setting is the TTL. The DNS TTL (time to live) is a setting that tells the DNS resolver how long to cache a query before requesting a new one. The shorter the TTL, the shorter amount of time the resolver holds that information in its cache. The TTL might impact in the amount of query volume (i.e traffic) that will be directed to the DNS Virtual Service. For records that rarely changes such as MX the TTL normally ranges from 3600 to 86400. For dynamic services it’s best to keep the TTL a little bit shorter. Typically values shorter than 30 seconds are not understood for most of recursive servers and the results might be not favorable in a long run. We will keep 30 seconds as default so far.

Similarly now we go to Templates > IPAM/DNS Profiles > CREATE > IPAM Profile

Since we will use VRFs to isolate both K8s clusters we check the “Allocate IP in VRF” option. There’s no need to add anything else at this stage.

Step 3: Configure the Cloud

Now it’s time to attach this profiles to the Cloud integration with vCenter. From the AVI GUI: Infrastructure > Default-Cloud > Edit (pencil icon).

Next assign the just created DNS and IPAM Profile in the corresponding section IPAM/DNS at the bottom of the window. The State Based DNS Registration is an option to allow the DNS service to monitor the operational state of the VIPs and create/delete the DNS entries correspondingly.

We also need to check the Management Network as we defined in the AVI Controller Installation. This network is intended for control plane and management functions so there’s no need to place it in any of the VRF that we will use for Data Path. In our case we will use a network which corresponds to a vCenter Port Group called REGIONB_MGMT_3020 as defined during the initial setup wizard. In my case I have allocated an small range of 6 IPs since this is a test environment and a low number of SEs will be spin up. Adjust according to your environment.

Step 4: Define VRFs and Networks

When multiple K8S clusters are in place it’s a requirement to use VRFs as a method of isolation from a routing perspective of the different clusters. Note that the automatic discovery process of networks (e.g PortGroups) in the compute manager (vCenter in this case) will place them into the default VRF which is the global VRF. In order to achieve isolation we need to assign the discovered networks manually into the corresponding VRFs. In this case I will use two VRFs: VRF_AZ1 for resources that will be part of AZ1 and VRF_AZ2 for resources that will be part of AZ2. The envisaged network topology (showing only Site 1 AZ1) once any SE is spin up will look like this:

From the AVI GUI Go to Infrastructure > Routing > VRF Context > Create and set a new VRF with the name VRF_AZ1

Now, having in mind our allocated networks for FRONTEND and BACKEND as in the previous network topology figure, we have to identify the corresponding PortGroups discovered by AVI Controller as part of the vCenter Cloud Integration. If I go to Infrastructure > Networks we can see the full list of discovered networks (port groups) as well as their current subnets.

In that case the PortGroup for front-end (e.g where we expose the Virtual Services externally) networks is named AVI_FRONTEND_3025. If we edit using the Pencil Icon for that particular entry we can assign the Routing Context (VRF) and, since I am not using DHCP in my network I will manually assign an IP Address Pool. The controller will pick one of the free addresses to plug the vNIC of the SE in the corresponding network. Note: we are using here a two arm deployment in which the frontend network is separated from the Backend network (the network for communicating with backend servers) but there is a One-Arm variant that is also supported.

For the backend network we need to do the same configuration changing the Network to REGIONB_VMS_3024 in this case.

Similarly we have to repeat the process with the other VRF completing the configuration as per above table:

Network_NameRouting ContextIP SubnetIP Address PoolPurpose
AVI_FRONTEND_3025VRF_AZ110.10.25.0/2410.10.25.40-10.20.25.59VIPs for Site 1 AZ1
REGIONB_VMS_3024VRF_AZ110.10.24.0/2410.10.24.164-10.10.24.169SE backend connectivity
AVI_FRONTEND_3026VRF_AZ210.10.26.0/2410.10.26.40-10.20.25.59VIPs for Site 1 AZ2
REGIONB_VMS_3023VRF_AZ210.10.23.0/2410.10.23.40-10.20.25.59SE backend connectivity
Network, VRFs, subnets and pools for SE Placement.

The Service Engine Group it’s a logical group with a set of configuration and policies that will be used by the Service Engines as a base configuration. The Service Engine Group will dictates the High Availability Mode, the size of the Service Engines and the metric update frequency among many other settings. The AVI Kubernetes Operator element will own a Service Engine to deploy the related k8s services. Since we are integrating two separated k8s cluster we need to define corresponding Service Engine Groups for each of the AKOs. From the AVI GUI go to Infrastructure > Service Engine Group > CREATE and define the following suggested properties.

SettingValueTab
Service Engine Group NameS1-AZ1-SE-GroupBasic Settings
Metric Update FrequencyReal-Time Metrics Checked, 0 minBasic Settings
High Availability ModeElastic HA / N+M (buffer)Basic Settings
Service Engine Name Prefixs1az1Advanced
Service Engine FolderAVI K8S/Site1 AZ1Advanced
Buffer Service Engines0Advanced
Service Engine Group Definition for Site 1 AZ1

Similarly let’s create a second Service Engine Group for the other k8s cluster

SettingValueTab
Service Engine Group NameS1-AZ2-SE-GroupBasic Settings
Metric Update FrequencyReal-Time Metrics Checked, 0 minBasic Settings
High Availability ModeElastic HA / N+M (buffer)Basic Settings
Service Engine Name Prefixs1az2Advanced
Service Engine FolderAVI K8S/Site1 AZ2Advanced
Buffer Service Engines0Advanced
Service Engine Group Definition for Site 1 AZ2

Step 6: Define Service Engine Groups for DNS Service

This Service Engine Groups will be used as configuration base for the k8s related services such as LoadBalancer and Ingress, however remember we need to implement also a DNS to allow name resolution in order to resolve the FQDN from the clients trying to access to our exposed services. As a best practique an extra Service Engine Group to implement the DNS related Virtual Services is needed. In this case we will use similar settings for this purpose.

SettingValueTab
Service Engine Group NameDNS-SE-GroupBasic Settings
Metric Update FrequencyReal-Time Metrics Checked, 0 minBasic Settings
High Availability ModeElastic HA / N+M (buffer)Basic Settings
Service Engine Name PrefixdnsAdvanced
Service Engine FolderAVI K8S/Site1 AZ1Advanced
Buffer Service Engines0Advanced
Service Engine Group Definition for Site 1 AZ2

Once done, we can now define our first Virtual Service to serve the DNS queries. Let’s go to Applications > Dashboard > CREATE VIRTUAL SERVICE > Advanced Setup. To keep it simple I will reuse in this case the Frontend Network at AZ1 to place the DNS service and, therefore, the VRF_AZ1. You can choose a dedicated VRF or even the global VRF with the required Network and Pools.

Since we are using the integrated AVI IPAM we don’t need to worry about IP Address allocation. We just need to select the Network in which we want to deploy the DNS Virtual Service and the system will take one free IP from the defined pool. Once setup and in a ready state, the name of the Virtual Service will be used to create a DNS Record type A that will register dinamically the name into the integrated DNS Service.

Since we are creating a Service that will answer DNS Queries, we have to change the Application Profile at the right of the Settings TAB, from the default System-HTTP to the System-DNS which is a DNS default specific profile.

We can tell how the Service Port has now changed from default 80 for System-HTTP to UDP 53 which, as you might know, is the well-known UDP port to listen to DNS queries.

Now if we click on Next till the Step 4: Advanced tab, we will define the SE Group that the system use when spinning up the service engine. We will select the DNS-SE-Group we have just created for this purpose. Remember that we are not creating a Virtual Service to balance across a farm of DNS which is a different story, but we are using the embedded DNS service in AVI so theres no need to assign a Pool of servers for our DNS Virtual service.

For testing purposes, in the last configuration step lets create a test DNS record such as test.avi.iberia.local

Once done the AVI controller will communicate with vCenter to deploy the needed SE. Note the prefix of the SE match the Service Engine Name Prefix we defined in the Service Engine Group settings. The VM will be placed in the corresponding Folder as per the Service Engine Folder setting within the Service Engine Group configuration.

In the Applications > Dashboard section the

After a couple of minutes we can check the status of the just created Service Engine from the GUI in Infrastructure > Service Engine. Hovering the mouse over the SE name at the top of the screen we can see some properties such as the Uptime, the assigned Management IP, Management Network, Service Engine Group and the physical host the VM is running on.

Also if we click in the In-use Interface List at the bottom we can see the IP address assigned to the VM

The IP assigned to the VM is not the IP assigned to the DNS VS itself. You can check the assigned IP for the dns-site1 VS from the Applications > Virtual Services page.

Last step is instructing the AVI controller to use the just created DNS VS when receiving DNS queries. This is done from Administration > Settings > DNS Service and we will select the local-dns-site1 service.

We can now query the A record test.avi.iberia.local using dig.

seiberia@k8sopsbox:~$ dig test.avi.iberia.local @10.10.25.44
 ; <<>> DiG 9.16.1-Ubuntu <<>> test.avi.iberia.local @10.10.25.44
 ;; global options: +cmd
 ;; Got answer:
 ;; WARNING: .local is reserved for Multicast DNS

 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60053
 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
 ;; WARNING: recursion requested but not available
 ;; OPT PSEUDOSECTION:
 ; EDNS: version: 0, flags:; udp: 512
 ;; QUESTION SECTION:
 ;test.avi.iberia.local.         IN      A
 ;; ANSWER SECTION:
 test.avi.iberia.local.  30      IN      A       10.10.10.10
 ;; Query time: 8 msec
 ;; SERVER: 10.10.25.44#53(10.10.25.44)
 ;; WHEN: Mon Nov 16 23:31:23 CET 2020
 ;; MSG SIZE  rcvd: 66

And remember, one of the coolest features of AVI is the rich analytics. This is the case also for DNS service. As you can see we have rich traceability of the DNS activity. Below you can see how a trace of a DNS query looks like. Virtual Services > local-dns-site1 > Logs (Tick non-Significant Logs radio button)…

At this point any new Virtual Service will register its name and its allocated IP Address to the DNS Service dynamically as a A Record. Once the AVI configuration is done now it’s time to move to the next level and start deploying AKO in the k8s cluster.

.

© 2025 SDefinITive

Theme by Anders NorenUp ↑