One of the tenants in our INOVATIV Cloud notified us that he saw an inconsistency in the memory usage measured by Windows Azure Pack compared to the actual memory used by the VMs. Tenant Tom claimed he had only used 13824MB but Windows Azure Pack reported he had used 25344MB. Apparently, the memory was melting before his eyes. Who or what had taken the missing memory?
This is how the concerned tenant sees his resource allocation in the Windows Azure Pack Service Management Portal. As you can observe, 83% of memory was consumed, almost double the amount of what the tenant had actually consumed by the virtual machines that had been so easily deployed to the privately hosted Windows Azure Pack cloud. “Easy come easy gone,” Tom thought.
A Service Plan in WAP can be configured by the Windows Azure Pack administrator with usage limits for the following resources:
# of Virtual Machines
# of Cores
RAM in MB
Storage in GB
If you have the Hyper-V Network Virtualization Gateway in place, you can also limit the following virtual network resources:
# of Virtual Networks
# of Site-to-Site VPNs per network
Within Virtual Networks, network usage can be capped for network read/sec and network write/sec, either to keep your most enthusiastic buddies under control or alternatively sell additional network capacity to more demanding tenants.
On further examination, Watson decided to look at several of the other subscriptions to see if there were any similar cases. Why did we never get any complaints before? In fact only recently our team of consultants devoured more than half the capacity of a 2-node Hyper-V 2012 R2 cluster. We started getting the dreaded overcommitted message in Virtual Machine Manager which by default keeps a cluster reserve of 1. We could have made some concessions to the individual reservation parameters, but that would be too easy.
To check the cluster reserve, right-click the cluster and select Properties.
In the following figure, the cluster reserve state is healthy, based on the capacity reservation of one cluster node. When we had only two nodes in the cluster, the cluster reserve state had turned to unhealthy. The quick but less responsible way out was to set the reserve to 0. During a hardware failure or server maintenance, there would be inadequate capacity to start up or live migrate all VMs to the surviving cluster node. For exactly this situation VM priority was designed. VMs with High and Medium priority would get priority over Low and No Auto Start priority VMs. The default of a high-available VM is set to Medium and guarantees Live Migration capability in emergency situations. In our case all VMs are set to Medium priority as all tenants are equal and pay nothing but passion and commitment for their consumed capacity. In a commercial cloud one could set a higher price on VMs that require High or Medium priority.
Tweaking the reserves of the individual hosts could have given some temporary relief, but in the end would not make all that much of a difference. The amounts are already set at sharp levels so we left them alone. As our tenant cluster is for learning, research labs but also for live demos during major events like TechDays, TechEd North America, TechEd Europe, System Center Universe and ExpertsLive and to a lesser extent Innovate and events by Hyper-V.nu and Scug.nl, we can hardly afford to bring the VMs down for maintenance or suffer the loss of a cluster node. Adding a third cluster node with an additional 32 cores and 192GB of memory brought enough capacity for the time being. We learnt that in our practice memory was the most limiting factor, far ahead of cores and storage. Because we have plenty of SMB3 storage, we have not yet set a cap on storage. The availability of 20Gb of Ethernet bandwidth did not need any usage limits either, apart from the Quality of Service minimum value we set on the Management, Live Migration, CSV and SMB tNICs in the management Windows network adapter team. Live Migrating VMs out from one node to two other cluster nodes easily consumes 14 to 15Gb/s using the new Live Migration with Compression capability in Windows Server 2012 R2 Hyper-V, which turned out faster than using Live Migration over SMB because the HP NICs which are OEM’d from Emulex do no support RDMA.
BACK TO THE PLOT
But lets not digress too much. While Sherlock Holmes – “I know my methods” – had ample time to oversee the crime scene and scrutinize the possible suspects, Watson had carefully examined all Windows Azure Pack subscriptions and was able to compare the usage statistics with the actual usage of the VMs. “I have found additional evidence,” said Watson.
"Excellent!" Holmes cried.
"Elementary," said his astute assistant.
What Watson had found was an extraordinary oversubscription of RAM in one of the other subscriptions. He had discovered the Windows Azure Pack Admin Portal, which was far beyond the Inspector’s imagination. He had never seeing anything more spectacular than this.
In fact Watson had managed to discover the list of all existing subscriptions and had struck on a suspect by the name of Walter, not to be confused with the meth king Walter White in Breaking Bad.
Walter had mysteriously been able to consume an extraordinary amount of RAM, far beyond his allotted quota. To be precise 6868% of the assigned 30GB of RAM. How is this possible? “Let me think Watson!”
Accepting that what Holmes saw was true, he conjectured there must be other instruments to examine this mysterious case. Watson, as always a smart Inspector, called out “Look what I’ve found!” In the direct neighborhood of the tenant cluster, he ran into a management Hyper-V cluster full of VMs with System Center management components. “What! Did you find footprints of the dreaded hound?” exclaimed Holmes. “No such thing, Sir! You will not believe this. I struck upon such powerful tools, the kind of which I usually don’t carry around in my bag.”
What Watson had hit was an unattended Virtual Machine Manager console which coincidentally stood open at the tenant cluster Virtual Machine view. It did not take long before Watson discovered the filter to show only the tenant who had purportedly acquired more RAM than he was entitled to.
MORAL OF THE STORY
In fact there was nobody to blame except that some VMs that had been manually added to the Windows Azure Pack subscription, were still configured with Dynamic Memory with the default Maximum Memory value of 1TB. As it turns out, Windows Azure Pack measures memory usage based on the maximum memory setting, even though the actual memory demand is much lower and the ceiling has not yet been reached. In Tom’s case he was using less memory than advertised in his tenant portal, bust was still able to create new VMs because he had not reached his memory quota. Because some of his VMs were set to Dynamic Memory with a conservative maximum. Walter on the other hand had two VMs with Dynamic Memory and its maximum set to 1TB and could no longer create new VMs.
All VMs that had been deployed through the tenant Service Management Portal, were measured correctly. All VM templates in Virtual Machine Manager were configured with 1 core and a fixed amount of 1792MB of memory. Via the dashboard in the Service Management Portal, the tenant is able to upsize or downsize the VM’s capacity using any one of the available VMM hardware profiles made available in the subscribed Service Plan.
So in conclusion, either use Static Memory or set the hardware profile to Dynamic with identical values for Startup Memory and Maximum memory. In both cases tenants get what they paid for. The ingenious design of Hyper-V Dynamic Memory allows a very low setting for Minimum Memory. The default is miraculously set to an extreme low of 8KB. Because we don’t want the risk of draining a tenant VM of memory until it is forced to commit suicide, we set Minimum Memory to 256MB.
“It is a capital mistake to theorize in advance of the facts.”