Accessing KubeVirt VMs Outside Kubernetes - Part 2

· 11 min read
Accessing KubeVirt VMs Outside Kubernetes - Part 2

Previously in Part 1, I mentioned that Kube-OVN is more powerful than Multus-CNI, but what makes it so?

Accessing KubeVirt VMs Outside Kubernetes - Part 1
By default, Kubevirt VM only connected to the container network and inaccesable from outside. We can access our Kubevirt VMs from the outside by creating a Kubernetes Service of type LoadBalancer. By ensuring the selector in the Service matches the VM’s domain label, and define the desired ports for external

Having too many NICs can create a single point of failure (SPOF) in a worker or compute node. Maybe we can solve it by creating a bond interface from the two existing interfaces.

But it makes each node requires an additional NIC to connect to the outside network, which is extremely costly. If a typical VM orchestration setup requires 1 NIC per worker node for external network access, this new setup would require 2 NICs per node. So, if you have 50 worker nodes, you'd need 100 NICs, compared to just 50 NICs in the traditional setup. This doubles the hardware requirement.

Worse, if a single NIC fails, it renders all the pods or VMs on that node inaccessible.

That issue can be resolved with SDN solutions like OVN.

Create OVN External Gateway

Create ConfigMap ovn-external-gw-config in kube-system Namespace:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ovn-external-gw-config
  namespace: kube-system
data:
  type: "centralized"
  enable-external-gw: "true"
  external-gw-nodes: "jah-k3s-h100-001,jah-k3s-h100-002,jah-k3s-h100-003"
  external-gw-nic: "enp8s0"
  external-gw-addr: "10.4.19.1/24"
  nic-ip: "10.4.19.254/24"
  nic-mac: "16:52:f3:13:6a:25"
  • enable-external-gw: Whether to enable SNAT and EIP functions.
  • type: centrailized or distributed, Default is centralized If distributed is used, all nodes of the cluster need to have the same name NIC to perform the gateway function.
  • external-gw-nodes: In centralized mode,The names of the node performing the gateway role, comma separated.
  • external-gw-nic: The name of the NIC that performs the role of a gateway on the node.
  • external-gw-addr: The IP and mask of the physical network gateway.
  • nic-ip,nic-mac: The IP and Mac assigned to the logical gateway port needs to be an unoccupied IP and Mac for the physical subnet.

The type is centralized because i want to use one or more nodes as an external gateway, if you want to use all nodes as an external gateway you can choose distributed, and for this case, i only use 3 node which is on parameter external-gw-nodes and jah-k3s-h100-001, jah-k3s-h100-002, jah-k3s-h100-003 will become an external gateway, next is enable-external-gw set in true for enabling eip/fip function and the last is nic-ip is for a bridge between my physical network with ovn network.

Confirm the Configuration Take Effect

Check the OVN-NB status to confirm that the ovn-external logical switch exists and that the correct address and chassis are bound to the ovn-cluster-ovn-external logical router port.

root@jah-k3s-ctrl-001:~# kubectl ko nbctl show
switch 967436fa-c091-4340-b440-9ae33b3c98cc (external)
    port localnet.external
        type: localnet
        addresses: ["unknown"]
    port external-ovn-cluster
        type: router
        router-port: ovn-cluster-external
router 862b5126-73ae-4173-bf53-1a1350fdfebb (ovn-cluster)
    port ovn-cluster-external
        mac: "16:52:f3:13:6a:25"
        networks: ["10.4.19.254/24"]
        gateway chassis: [f4af51d5-595d-4138-9e6d-410a707a6da9 06c2cd08-4c5e-459d-8b80-8ddf0b98153a 1e49fb9a-aa0d-4077-b953-ed21d8499df1]
    port ovn-cluster-ovn-default
        mac: "3e:68:47:12:a1:41"
        networks: ["10.253.0.1/16"]
    port ovn-cluster-join
        mac: "42:25:27:76:f6:20"
        networks: ["100.64.0.1/17"]
    nat 8795fcb2-9a12-418c-8db7-2608d76f8e61
        external ip: "10.4.19.101"
        logical ip: "10.253.0.83"
        type: "dnat_and_snat"

Check the OVS status to confirm that the corresponding NIC is bridged into the br-external bridge:

root@jah-k3s-ctrl-001:~# kubectl ko vsctl jah-k3s-h100-001 show
29613471-5654-4e0b-8e61-b48f6648824a
    Bridge br-external
        Port enp8s0
            Interface enp8s0
        Port patch-localnet.external-to-br-int
            Interface patch-localnet.external-to-br-int
                type: patch
                options: {peer=patch-br-int-to-localnet.external}
        Port br-external
            Interface br-external
                type: internal
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port br-int
            Interface br-int
                type: internal
    ....................
    ovs_version: "3.1.6"

From OVN's perspective, the ovn-cluster-external port is created with the same IP and MAC address specified in our ConfigMap. Can we test pinging it from outside the network?

===============================Outside=============================== ====================================Worker==============================================
root@raja-nfs:~/kubevirt-vm/ovn-ext-gw# ping -c3 10.4.19.254         │ root@jah-k3s-h100-001:~# tcpdump -i enp8s0 icmp
PING 10.4.19.254 (10.4.19.254) 56(84) bytes of data.                 │ tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
64 bytes from 10.4.19.254: icmp_seq=1 ttl=254 time=0.436 ms          │ listening on enp8s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
64 bytes from 10.4.19.254: icmp_seq=2 ttl=254 time=0.399 ms          │ 09:13:14.393120 IP 10.4.19.29 > 10.4.19.254: ICMP echo request, id 82, seq 1, length 64
64 bytes from 10.4.19.254: icmp_seq=3 ttl=254 time=0.470 ms          │ 09:13:14.393401 IP 10.4.19.254 > 10.4.19.29: ICMP echo reply, id 82, seq 1, length 64
                                                                     │ 09:13:15.421829 IP 10.4.19.29 > 10.4.19.254: ICMP echo request, id 82, seq 2, length 64
--- 10.4.19.254 ping statistics ---                                  │ 09:13:15.422061 IP 10.4.19.254 > 10.4.19.29: ICMP echo reply, id 82, seq 2, length 64
3 packets transmitted, 3 received, 0% packet loss, time 2053ms       │ 09:13:16.445761 IP 10.4.19.29 > 10.4.19.254: ICMP echo request, id 82, seq 3, length 64
rtt min/avg/max/mdev = 0.399/0.435/0.470/0.029 ms                    │ 09:13:16.446085 IP 10.4.19.254 > 10.4.19.29: ICMP echo reply, id 82, seq 3, length 64

Great! The IP is reachable and can be pinged from outside.

Operational Test Create VM

Let's create kubevirt VM

# vm.yaml
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: testing
spec:
  runStrategy: Always
  template:
    metadata:
      labels:
        kubevirt.io/size: small
        kubevirt.io/domain: testing
    spec:
      domain:
        cpu:
          cores: 2
        devices:
          disks:
          - name: disk0
            disk:
              bus: virtio
          - name: cloudinit
            cdrom:
              bus: sata
              readonly: true
          rng: {}
          interfaces:
          - name: default
            masquerade: {}
        resources:
          requests:
            memory: 1024Mi
      terminationGracePeriodSeconds: 15
      networks:
      - name: default
        pod: {}
      volumes:
      - name: disk0
        dataVolume:
          name: testing
      - name: cloudinit
        cloudInitNoCloud:
          userData: |-
            #cloud-config
            chpasswd:
              list: |
                ubuntu:ubuntu
              expire: False
            disable_root: false
            ssh_pwauth: True
            ssh_authorized_keys:
              - ssh-rsa AAAA...
  dataVolumeTemplates:
    - metadata:
        name: testing
      spec:
        storage:
          storageClassName: csi-rbd-sc
          accessModes:
          - ReadWriteMany
          resources:
            requests:
              storage: 10Gi
        source:
          registry:
            url: docker://quay.io/containerdisks/ubuntu:22.04
kubectl get vmi,vm testing
NAME                                         AGE   PHASE     IP            NODENAME           READY
virtualmachineinstance.kubevirt.io/testing   1h   Running   10.253.0.83   jah-k3s-h100-003   True

NAME                                 AGE   STATUS    READY
virtualmachine.kubevirt.io/testing   1h   Running   True

Attach EIP

The EIP or SNAT rules configured by the Pod can be dynamically adjusted via kubectl or other tools, remember to remove the ovn.kubernetes.io/routed annotation to trigger the routing change.

kubectl get pod -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE               NOMINATED NODE   READINESS GATES
virt-launcher-testing-4vmxf     2/2     Running   0          16h     10.253.0.83   jah-k3s-h100-003   <none>           1/1

kubectl annotate pod virt-launcher-testing-4vmxf ovn.kubernetes.io/eip=10.4.19.101 --overwrite
kubectl annotate pod virt-launcher-testing-4vmxf ovn.kubernetes.io/routed-

When the EIP or SNAT takes into effect, the ovn.kubernetes.io/routed annotation will be added back.

Recheck from OVN Perspective

root@jah-k3s-ctrl-001:~# kubectl ko nbctl show
switch f6adab24-f39f-4d34-9226-0d38f9375331 (ovn-default)
    .........................

switch 967436fa-c091-4340-b440-9ae33b3c98cc (external)
    port localnet.external
        type: localnet
        addresses: ["unknown"]
    port external-ovn-cluster
        type: router
        router-port: ovn-cluster-external
router 862b5126-73ae-4173-bf53-1a1350fdfebb (ovn-cluster)
    port ovn-cluster-external
        mac: "16:52:f3:13:6a:25"
        networks: ["10.4.19.254/24"]
        gateway chassis: [f4af51d5-595d-4138-9e6d-410a707a6da9 06c2cd08-4c5e-459d-8b80-8ddf0b98153a 1e49fb9a-aa0d-4077-b953-ed21d8499df1]
    port ovn-cluster-ovn-default
        mac: "3e:68:47:12:a1:41"
        networks: ["10.253.0.1/16"]
    port ovn-cluster-join
        mac: "42:25:27:76:f6:20"
        networks: ["100.64.0.1/17"]
    nat 8795fcb2-9a12-418c-8db7-2608d76f8e61
        external ip: "10.4.19.101"
        logical ip: "10.253.0.83"
        type: "dnat_and_snat"
root@jah-k3s-ctrl-001:~# kubectl ko nbctl lr-nat-list ovn-cluster
TYPE             GATEWAY_PORT          EXTERNAL_IP        EXTERNAL_PORT    LOGICAL_IP          EXTERNAL_MAC         LOGICAL_PORT
dnat_and_snat                          10.4.19.101                         10.253.0.83

OVN is working perfectly and has created the FIP/EIP port—time to take it for a test drive!

============================================VM======================================================== ============================================Outside=================================================
user@sandbox:~/k3s$ virtctl console testing                                                           │root@raja-nfs:~# ping -c3 10.4.19.101
Successfully connected to testing console. The escape sequence is ^]                                  │PING 10.4.19.101 (10.4.19.101) 56(84) bytes of data.
                                                                                                      │64 bytes from 10.4.19.101: icmp_seq=1 ttl=62 time=2.55 ms
ubuntu@testing:~$ ping -c3 10.4.19.1                                                                  │64 bytes from 10.4.19.101: icmp_seq=2 ttl=62 time=0.965 ms
PING 10.4.19.1 (10.4.19.1) 56(84) bytes of data.                                                      │64 bytes from 10.4.19.101: icmp_seq=3 ttl=62 time=0.854 ms
64 bytes from 10.4.19.1: icmp_seq=1 ttl=62 time=1.74 ms                                               │
64 bytes from 10.4.19.1: icmp_seq=2 ttl=62 time=0.853 ms                                              │--- 10.4.19.101 ping statistics ---
64 bytes from 10.4.19.1: icmp_seq=3 ttl=62 time=0.478 ms                                              │3 packets transmitted, 3 received, 0% packet loss, time 2003ms
                                                                                                      │rtt min/avg/max/mdev = 0.854/1.457/2.553/0.776 ms
--- 10.4.19.1 ping statistics ---                                                                     │root@raja-nfs:~# 
3 packets transmitted, 3 received, 0% packet loss, time 2003ms                                        │root@raja-nfs:~# ssh [email protected] hostname
rtt min/avg/max/mdev = 0.478/1.024/1.742/0.530 ms                                                     │testing
ubuntu@testing:~$                                                                                     │root@raja-nfs:~# ssh [email protected] whoami
ubuntu@testing:~$ ip a                                                                                │ubuntu
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000           │root@raja-nfs:~# ssh [email protected] ip a
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00                                             │1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo                                                                    │    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
       valid_lft forever preferred_lft forever                                                        │    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host                                                                          │       valid_lft forever preferred_lft forever
       valid_lft forever preferred_lft forever                                                        │    inet6 ::1/128 scope host 
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc fq_codel state UP group default qlen 1000 │       valid_lft forever preferred_lft forever
    link/ether fe:71:36:c8:d0:04 brd ff:ff:ff:ff:ff:ff                                                │2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc fq_codel state UP group default qlen 1000
    inet 10.0.2.2/24 metric 100 brd 10.0.2.255 scope global dynamic enp1s0                            │    link/ether fe:71:36:c8:d0:04 brd ff:ff:ff:ff:ff:ff
       valid_lft 86254361sec preferred_lft 86254361sec                                                │    inet 10.0.2.2/24 metric 100 brd 10.0.2.255 scope global dynamic enp1s0
    inet6 fe80::fc71:36ff:fec8:d004/64 scope link                                                     │       valid_lft 86254426sec preferred_lft 86254426sec
       valid_lft forever preferred_lft forever                                                        │    inet6 fe80::fc71:36ff:fec8:d004/64 scope link 
ubuntu@testing:~$                                                                                     │       valid_lft forever preferred_lft forever

Nice! The EIP/FIP functions just like they do in other VM orchestration platforms.

Deep Dive

We can verify if the IP is pingable from both the physical network and the OVN network. But now the real question is: where is that IP actually attached? And when and where does the IP translation occur?

Let’s begin by sniffing the interface on jah-k3s-h100-003, since that’s where the pod and VM are located.

From the terminal output, we can observe that the secondary NIC did not capture any ICMP traffic, even though the pod and VM are running there. Now, let’s switch to jah-k3s-h100-001.

There we have it—traffic flows into jah-k3s-h100-001 and then routes out to jah-k3s-h100-003. This highlights a key advantage of OVN: it can designate specific nodes as external gateways, eliminating the need for every node to have identical interfaces.

With OVN, only a few nodes require a secondary NIC to act as external gateway nodes, making it more efficient. This is a great example of how OVN operates seamlessly without needing additional tools for bridging, tunneling, routing, or NAT-ing.

pod -> node ovn -> centralized ovn -> destination node ovn -> destination pod

Failover Test

I chose jah-k3s-h100-001, jah-k3s-h100-002, and jah-k3s-h100-003 as "centralized" nodes, meaning all NATing, ingress, and egress processes happen there. Earlier, when I tested pinging the VM, all traffic was routed through jah-k3s-h100-001, even though the pod and VM were running on jah-k3s-h100-003.

So, let’s begin by shutting down jah-k3s-h100-001. What do you think will happen?

kubectl get node
NAME               STATUS                        ROLES                               AGE   VERSION
jah-k3s-ctrl-001   Ready                         control-plane,etcd,master           19d   v1.28.11+k3s1
jah-k3s-ctrl-002   Ready                         control-plane,etcd,master           19d   v1.28.11+k3s1
jah-k3s-ctrl-003   Ready                         control-plane,etcd,master           19d   v1.28.11+k3s1
jah-k3s-h100-001   NotReady,SchedulingDisabled   worker                              19d   v1.28.11+k3s1
jah-k3s-h100-002   Ready                         worker                              19d   v1.28.11+k3s1
jah-k3s-h100-003   Ready                         worker                              19d   v1.28.11+k3s1
jah-k3s-halb-001   Ready                         infra,worker                        19d   v1.28.11+k3s1
jah-k3s-halb-002   Ready                         infra,worker                        19d   v1.28.11+k3s1
jah-k3s-halb-003   Ready                         infra,worker                        19d   v1.28.11+k3s1
jah-k3s-l40s-001   Ready                         worker                              19d   v1.28.11+k3s1
jah-k3s-l40s-002   Ready                         worker                              19d   v1.28.11+k3s1
jah-k3s-l40s-003   Ready                         worker                              19d   v1.28.11+k3s1
jah-k3s-prmt-001   Ready                         logging,metrics,monitoring,worker   19d   v1.28.11+k3s1
jah-k3s-prmt-002   Ready                         logging,metrics,monitoring,worker   19d   v1.28.11+k3s1
jah-k3s-prmt-003   Ready                         logging,metrics,monitoring,worker   19d   v1.28.11+k3s1

Let's begin by sniffing the interface on jah-k3s-h100-002.

From the terminal output, we can observe that the secondary NIC hasn't captured any ICMP traffic.

Now, let's move on to jah-k3s-h100-003.

There we go, now all the traffic is directed towards jah-k3s-h100-003.

Impressions from using Kubevirt

One of the advantages of KubeVirt is that you can set up a VM on a container network. KubeVirt and OVN have significantly altered the behavior of VM orchestration. The most notable change is that they have shifted cloud-init from a network web service to a CD-ROM disk, completely removing the need for a DHCP server. From a functionality perspective, especially in terms of networking, KubeVirt+OVN can offer nearly the same capabilities as other VM orchestration tools like OpenStack, Proxmox, and VMware.

KubeVirt technology addresses the needs of development teams that have adopted or want to adopt Kubernetes but possess existing Virtual Machine-based workloads that cannot be easily containerized. KubeVirt isn't intended to replace traditional VM orchestration tools just yet. For now, it's more about integrating VMs into Kubernetes environments.

Perhaps some companies will take this concept seriously and build VM orchestration on top of Kubernetes using modified versions of KubeVirt and Kube-OVN in the future.

Reference

EIP and SNAT - Kube-OVN Document
None