This blog is written by Mark Scholman (@markscholman).
This post will explain the do’s and don’ts of Hyper-V Network virtualization. Especially on the topic where we want to bring our next solution / example to Microsoft Azure Pack and / or System Center Virtual Machine Manager. First things first. For those who need to understand the basics on Hyper-V Network Virtualization I recommend to start reading the article here.
This blogpost is based on the following use case:
A customer wants to host their infrastructure at a Service Provider. The Service Provider utilizes Hyper-V Network virtualization, management with System Center Virtual Machine Manager and optionally Windows Azure Pack. The customer currently has the following networks:
- Production Network
- DMZ Network
The customer prefers to bring their own Linux firewall and use it as the default gateway for their networks. The customer network consists of the following subnets:
Production subnet: 10.10.0.0/24
DMZ Network 10.11.0.0/24
As for each subnet the first possible IP address (normally x.x.x.1) is automatically provisioned as the default gateway. The FW/Gateway (MS-TEST-A01) is configured with x.x.x.254 for each NIC. The default gateway in the firewall is set to 10.10.0.1 and this VNet is enabled with Internet connection and NAT.
Not really an exciting network configuration you might think. We will change the default gateway in each machine to the x.x.x.254 (ip of the virtual fw).
The following image displays two provisioned virtual networks. VNET-A is configured with a gateway:
The virtual machines use the following IP configuration:
10.10.0.254/24 -> 10.10.0.1 as default gateway
10.10.0.5/24 -> 10.10.0.254 as default gateway
10.11.0.5/24 -> 10.11.0.254 as default gateway
10.11.0.6/24 -> 10.11.0.254 as default gateway
All servers run Windows Server 2012 R2. My Linux skills are limited, so I used a virtual machine running Windows Server 2012 R2 with RRAS enabled. All firewalls in the virtual machines are turned off. Just to prevent any issues related to the Operating System firewall.
First we start with the RRAS Server MS-TEST-A01. A ping to both hosts on the different network is successful.
For the next test a ping from MS-TEST-A02 to a host in the other Virtual Network is routed through RRAS. The first reply displays a ping to the RRAS interface on the same subnet. A ping to the RRAS interface on the VNET-B subnet (10.11.0.0/24) fails. A ping to MS-TEST-B01 and MS-TEST-B02 also fails.
Reversing the process and pinging from MS-TEST-B01 to MS-TEST-A01 also fails.
No connectivity across subnets. It is only possible to ping the RRAS interface on the same subnet. For a deeper analysis of the traffic we install message analyzer on the RRAS server and repeat the tests.
The ping initiated on MS-TEST-B02 to the gateway interface of the MS-TEST-A01 is received.
This validates the GW settings configured in MS-TEST-B01, but no packet are received on the GW interface. What causes the traffic interception?
The next step is to start Message Analyzer on the Hyper-V host. The message Analyzer displays the traffic flow when it starts encapsulating it before sending it to other Hyper-V Hosts. An example on traffic using NVGRE is discussed later. The Message Analyzer must be configured for capturing traffic.
Open File in the Menu and select Capture / Trace.
Choose Local Link Layer and on the right under Trace Scenario Configuration click the configure button:
It is possible to define monitoring per virtual machine. Select the correct virtual machine. In the example the virtual machine that is unable to connect:
Are you familiar with the slide deck on switch extensions and the flow of packages?
The arrows reflect Ingress and Egress. The message analyzer allows you can choose a sinlgle direction. Please note, in troubleshooting scenarios it might be misleading as you might look at the wrong side. For this scenario all settings are turned on.
Submit the configuration screen and click “start with” to start the trace.
Initiate a ping to the unresponsive IP address and watched the traffic in the trace.
Drill down and look at the MAC address. It is the MAC address of the MS-TEST-A01. It goes down further into the rabbit hole and ends up at the Etw. Further no information there. What happens when I ping a VM in the same subnet that was tested successfully before:
The trace only displays the response traffic. It probably fails on the check for some value or setting in the lookup records. You see in the screenshot below the Customer ID is different as it each VM Network gets its own CustomerID:
Next add the subnet from VNET-B as an extra subnet to VNET-A. That action will make the customer ID from the picture above the same:
After the reboot of the VM’s I checked the lookup records on the Hyper-V host again. The CustomerID’s are now the same:
Ping from MS-TEST-B01 to MS-TEST-A02 and find out the results:
That worked! Let’s look at the tracing. Why is the packet never received at the gateway (in the picture on the right)? When I ping the interface on the same subnet it is received. But when the traffic needs to pass through the gateway (RRAS) it is never received at the gateway. But as before the traffic is received and responded by MS-TEST-A02.
The next step is to shut down MS-TEST-A01 and do the test again.
The ping is replied and in the message analyzing on the hyper-V host the trace shows the echo request and the echo reply:
I have also done testing by changing the gateway to 126.96.36.199. Then the connection does fails. So there is a dependency on the value of the gateway, but when it is in the same subnet it doesn’t care what the value is.
Is the lookuprecord table not only used for NVGRE encapsulation between Hyper-v hosts but is also used in the routing mechanism in virtual networking among subnets? I’m unable to see how the package is extracted, filtered and modified. As soon as the customer ID is different the switch will drop the packet.
Finally I moves all NICs to a traditional VLAN depended network. I created two VLANs, two new subnets and reconfigures the virtual machines. Everything functioned properly. Is it really something in the virtual network stack that results in this behavior?
As specified earlier I will show a trace with two virtual machines running on two Hyper-V nodes. The NVGRE encapsulation is also visible.
The packet is using 192.168.254.0/24 as the correct provider address space.
Network virtualization is a great feature for simplifying datacenter networking. But the ONLY option to allow traffic in and out of a Virtual Network is through a Network Virtualization Gateway. This can be accomplished by a S2S VPN, NAT or forwarding gateway. The sales slides never said “Bring your own router or firewall”. If you want a functioning gateway in a network based on network virtualization, use the network virtualization gateway.