The Azure team at Microsoft cannot be denied to run at a very high pace, when it comes to introducing new Azure functionality. Not only do public Azure features show up like clockwork every six weeks, this is also true for feature updates related to Hyper-V and System Center in the private cloud which get updated at an incredible pace. Just a few weeks ago we had to tell customers that if they had boot drives larger than 127GB, more than 1 network adapter, fixed IP address or when they had already adopted Hyper-V Generation 2 VMs, Azure Site Recovery would be a no-go (yet). But customers of the Azure cloud never have to wait very long and they get a very large say in what features are most important for them. Just take a look at the User Voice for Azure Site Recovery:
This week I was lucky to get a time slot from the Azure team to actually test the new Hyper-V Generation 2 support in our closest Azure datacenter West Europe, here in Amsterdam. I had already set up ASR as a preparation for several customers who were planning for Disaster Recovery from their onsite datacenters to Azure. I had more or less been ignoring the product as I didn’t have any use cases or the resources to test. But with interested customers, things can change very quickly. So I quickly configured a small research environment with one Hyper-V host, a few Generation 1 and 2 VMs, SQL Server 2014 and Virtual Machine Manager 2012 R2 with Update Rollup 5.
I will not detail the complete setup and configuration as this has already been well documented. Be careful not to read blogs on ASR older than a couple of months as so much has changed. In this blog I will focus on the new support for Hyper-V Generation 2 VMs. Generation 2 VMs arrived with Windows Server 2012 R2 Hyper-V and made installing VMs a little faster because none of the ancient devices had to be discovered. Also with the replacement of the VM BIOS by a UEFI, several new features became available:
- Secure boot
- DVD Drive hot add/removal
- PXE boot from synthetic network adapter
So let’s go back to the configuration of ASR for Generation 2 VMs. There are now multiple scenarios for ASR and my configuration is based on “Between an on-premises VMM site and Azure“, but there are several others available, including VMM site to VMM Site (with or without SAN Replication), Hyper-V to Azure (without VMM) or VMware Site to VMware Site (with or without SAN Replication), and we can expect a direct VMware to Azure before long. Because I had configured ASR some time ago, I had to download an update for my registration key. Secondly I had to refresh both the VMM ASR Provider and the Hyper-V ASR Agent for the Hyper-V host. If I had a 7-year old child, I could have delegated this task.
The major steps for protecting VMs with ASR are:
- Author Recovery Plan
- Disaster Recovery Drill
After Setup and Configuration, the next step is to bring the Generation 2 VM(s) under protection. This is something you can do entirely from the Azure Portal.
A job starts to enable protection and sets up initial replication. This is done in just under two minutes. The actual replication is quite time consuming if you don’t have an ExpressRoute connection. I could see no more than 10Mbps for the replication traffic. So you can imagine that if you have a 12GB image, dozens of VMs or maybe even hundred VMs to protect this can take ages. With some serious number of VMs to protect, I’d seriously think of ExpressRoute. Not sure if it was a limit of my Internet Connection, I sent an email to my Azure contact, Anoob Backer, who quickly responded with a registry update to speed up the upload. A blog by Anoob on this topic will soon follow.
If you see that the bandwidth is under-utilized: On Hyper-V host create the registry key of type DWORD with name HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication\UploadThreadsPerVM with value 16. Default value which the agent uses is 4. So this can make a difference for you if you are doing lots of initial replications.
Here you see that my first Generation 2 VM is being prepared for protection to Azure.
When the job completes, all preparations have been taken, but mind you, the initial replication (as well as the conversion process behind the scenes) is still running.
While ASR is busy you can check Hyper-V Manager which shows a Hyper-V checkpoint has been created to keep track of the deltas and it is sending the initial replica. It also gives you an indication of how much it has progressed, which wasn’t very far in my case.
You can also watch progress from the Azure portal and one thing that struck me is that the target VM which is selected for protection is A11 which is 16 cores and 112GB of memory. This can be explained by how the source VM is configured. In Azure the nearest matching VM size is selected. In my case the VM had unlimited maximum memory configured.
As soon as initial replication has finished, this is confirmed in Hyper-V Manager looking at Replication Mode, State and Health in the Replication tab of the VM Detail window.
To see the Replication details, right-click the VM, select Replication and View Replication Health.
Now that the VM has been fully replicated, you can also set a number of properties. It is possible to change the display name of the VM replica in Azure, change the VM size to a more realistic size and something that has also been added recently: the configuration of the target IP Address. If you have been using Hyper-V Replica, you’ll recognize the ability to set a fixed IP address on the replica VM.
Before we can do a VM failover, whether it is a test-failover, a planned or an unplanned failover, we need to create a Recovery Plan. Such a plan can be created for a single VM, for several VMs or for an application.
By clicking Create at the bottom, a new Recovery Plan (RP2) is created which can be run independently from RP1. Protection of both Generation 1 and Generation 2 VMs is possible. By the way, I write Generation in full, because one of the Hyper-V Product Managers once ‘reprimanded’ me for using Gen1 and Gen2. This seems to be something completely different.
Select the VMs to be included in the Recovery Plan and hit the OK button.
This Recovery Plan can be enhanced with custom actions such as PowerShell scripts to make sure certain prerequisites are made before or after failover. But if you are ready to test, just hit the Test Failover button.
This gives you a choice of selecting a particular Azure Virtual Network for testing purposes. In my case, I created an isolated network but with the same subnet, so the VM would land in the correct network given its failover IP address.
While the Test Failover run, you can view the job details.
What the previous step has accomplished, is to make the replica available as an Azure Virtual Machine which you can view and start under the Virtual Machines menu in the Azure Portal. You can verify the VM Size, its public IP address and the IP address you set for failover. This all works like a charm.
When it completes after about 15 minutes, it waits until you have tested the VMs in your Test Failover to see if the OS starts, the application runs and it can be reached using Remote Desktop. By default you cannot connect to the VM unless you configure one or more endpoints like Remote Desktop or Remote PowerShell.
You can finish the test by clicking Complete Test which gives you a chance to make some comments about the Failover Test. Under jobs you can see the entire duration of the Failover Test.
Almost identical is the procedure for a planned or unplanned failover. Click on Failover at the bottom of the page and select whether you want to initiate a planned failover, allowing you to have zero data loss because the primary VM is still online, or an unplanned failover, which is up-to-date until the last replication. The replication intervals you can set are the same as with Hyper-V Replica: 30 seconds, 5 minutes or 15 minutes. This can be set under Protected Items, as one of the properties of the VMM Cloud you are protecting.
In this case, we confirm the failover for moving from on-premises to Azure. As you can see it is possible to change direction in case the VM is already running in Azure.
Like with the test failover, the planned failover looks similar and the job runs for another 15 minutes to finalize the recovery plan.
The difference is that the VM on-premises has received a graceful shutdown and the Azure VM is running and waiting to be connected. Important to note is that planned failover is a zero data loss failover. Unplanned failover will have data loss corresponding to the replication frequency configured.
After failover, the VM is still using the same IP address it had before the failover, so if you have configured the Azure networks, Site2Site VPNs or ExpressRoute correctly, this VM is available running in Azure like you are used to. Please note that if you want to keep the subnet and IP address the same as on-premises, the entire subnet has to failover. You cannot have part of the VMs in the subnet run on-premises and another part in Azure. In that case you need to prepare a different subnet for the failover VMs and set up routing between the subnets.
The VM on-premises should be in the Off-state and has been prepared for planned failover. Before you can start using the VM after planned failover, you need to Commit this, so you have time to check that all prerequisites are in place. Committing this recovery plan will merge all snapshots.
The other less pleasant option is the unplanned failover, which usually means that your primary site has completely failed, has gone up in flames or has changed into a high-tech swimming pool. The flooded datacenter scenario is unfortunately a risk we have to deal with in the Netherlands, just like in the United States a hurricane can wreak havoc.
On the other hand, if everything has gone back to normal, you can easily failover back to your on-premises datacenter as in this case from Microsoft Azure to the VMM Site of your choice.
The generation 2 VM has started failback replication as is shown in Hyper-V Manager.
Because the VMs may either no longer be available, or have to be completely resynchronized, this step can take considerably longer. You can either start synchronizing while the VM in Azure stays online, or shut down the Azure VM and start synchronizing. When done, the VM is stopped in Azure and starts running again in the VMM Cloud.
The reason it took a bit long (13 hours 3 minutes) was because I left the failover without committing it. So take a look at the data synchronization for a good indication of the time it took to resynchronize the entire 12GB VM under my specific network conditions (which were not perfect).
I pressed the Complete Failover button some time the next day so forget about the 13 hours and 3 minutes.
We have now come full circle, where the generation 2 VM was initially replicated to Azure, was verified by a test failover, has undergone a planned failover, was reachable in its own subnet and IP address, was failed back using the same recovery plan as for test failover or the unplanned failover. The only thing we did not test is submerge our datacenter in a meter of water and do some real damage. Everything else was perfectly testable using Azure Site Recovery. What stands out is the following:
- Disaster Recovery has been made really easy to do
- Building business continuity processes around the technical DR process is something you still have to do
- Testing failover can finally be done without bringing everything else to a standstill
- New Azure Site Recovery features are being added as we speak
- Protection for Generation 2 Virtual Machines has been completed (and I got my 2 votes back so I can vote for other ASR functionality.
A new cool functionality was one I actually proposed myself: Site Recovery between Azure datacenters. In case you already have production VMs running in Azure, your data is already protected multiple times, either in one datacenter or across multiple datacenters (if you ordered Geo Redundant Storage), but you cannot use ASR to failover lets says from West Europe to North Europe, between two datacenters within one Azure Region. As I could deduct from the response of the ASR Product Team, they have already started planning on this new functionality.
Bottom line is: use your votes to make ASR even better than it already is today! If you haven’t done this yet, start exploring Azure Site Recovery. It is not at all difficult to learn and even with a small lab as in my case with 1 Hyper-V host, a VMM Server and a couple of VMs you can start testing it and get the confidence you need to start using this in production.
If you need further help, you probably know how to find me! Also big thanks to Anoob Backer for providing tips and reviewing this blog.