Recently I was asked to describe the correct procedure for defragmenting Cluster Shared Volumes on a Hyper-V R2 cluster. This is not really a very complicated task but if you have never had the opportunity to give it a try, this blog post will offer you the exact steps using PowerShell.
Let’s start with a case description: the System Center Operations Manager Windows Management Pack is reporting “Logical Disk Fragmentation Level is high” for your Hyper-V R2 servers.
A Cluster Shared Volume (CSV) contains the configuration, virtual hard disk and snapshot files of multiple Hyper-V guests. Notably fragmentation of the large VHD files deserve your attention.
Fragmentation of these files can become a problem because the disk head needs to use an increasing number of seeks, lowering the throughput and thus the perceived performance of the guest as a whole.
On the other hand, NTFS has become more and more efficient in recent OS versions and fragmentation need not always have a severe impact on performance.
CSV is a distributed orchestration layer on top of NTFS (implemented as a file system filter driver) and for fragmentation it takes advantage of all the NTFS techniques. The advantage of this design is that all disk management tools which have been written for NTFS continue to work, including a variety of defrag tools.
Before writing a file, NTFS searches for enough free space to write the entire file to disk. On a newly formatted disk the VHDs will be written contiguously while all the blocks are written and "zero out" which can be fairly time consuming. Of course this mainly applies when a fixed VHD format is used for which 100% of the space of the virtual hard disk is allocated and reserved on the disk. With dynamic VHD’s you are much more susceptible to fragmentation because initially only the header and a few control blocks are written to the physical disk. The VHD file will grow as more data is written to the file that the virtual hard disk represents. Suppose you have 10 VMs on the same CSV and each VM has access to one or more dynamic VHDs. The chances of fragmentation have grown tenfold.
The classic administrator will probably not think twice and start to defragment the disk as quickly as possible. However when dealing with a fragmented CSV disk, you have to plan in advance. Before you can start the defrag the CSV will have to be placed into maintenance mode.
The best advice I can give you is to start monitoring disk throughput and fragmentation statistics over a longer period of time. In other words get a baseline and check its deviation at set intervals in time.
How does Operations Manager monitor the health of the disk? According to Cameron Fuller’s article:
“Disk utilization is determined by the Average Disk Seconds Per Transfer monitor. This monitor’s healthy and critical states are defined as follows:
- Critical state occurs when the average disk seconds per transfer is greater than 50 for 5 minutes (after five samples on a 1-minute schedule)
Fragmentation health is determined by the Logical Disk Fragmentation Level monitor. This monitor’s healthy and warning states are defined as follows:
- Warning state occurs when the percentage of file fragmentation is greater than 10% (This monitor checks health state once a day at 3.00 a.m. on Saturday by default).
The Logical Disk Fragmentation Level monitor also includes a recovery task called Logical Disk Defragmentation, which is disabled by default. This task can automatically run a defragmentation if the drive exceeds the threshold defined for the monitor.
The following picture shows a fragmented logical disk in System Center Operations Manager’s Health Explorer.
See also this article by Kevin Holman on file fragmentation monitoring in Operations Manager
A key takeaway from this article is that if you automatically want to perform a defrag job against fragmented local disks, be aware that if the recovery task is enabled, it will run against both the physical servers and the VM’s at roughly the same time. The job might stress the SAN with more I/O’s than it can handle.
I have not been able to find out if Operations Manager is able to deal with CSV disks at all. So before thinking about automation, let’s first look at what are the exact steps to properly defrag one or more Cluster Shared Volumes in a Hyper-V R2 Cluster.
Windows Server 2008 R2 has its own defrag command. If you want to try this out manually start with an analysis by opening a command prompt with administrator privileges.
; Analyzing a disk
> Defrag /A
; Analyzing a disk with additional info and statistics
> Defrag /A /U /V
At the end it will show you if the disk requires a defrag or not.
Every CSV is a separate volume without a drive letter and all CSVs are logically organized underneath a root directory on disk C: called C:ClusterStorage. However, when you run a defrag for disk C: it does not defrag the individual CSV volumes. In fact the CSV has to be placed in maintenance to run properly. Enabling maintenance will at the same time put your guests in Saved State, so be careful when to start such a job.
After explaining some of the background of Cluster Shared Volumes, I will detail the manual steps to properly defrag a CSV volume.
CSV, which is enabled in most Hyper-V R2 clusters, is implemented as an NTFS junction point. This is more or less comparable to a mountpoint to which volumes are mounted. CSV uses the C:ClusterStorage root directory and each volume is placed underneath:
These CSV volumes are used by multiple VMs in the cluster so by just starting a defrag you would not have exclusive access to many of the files.
Another important fact to know is that in a Hyper-V R2 cluster, VMs are able to read and write to their respective VHD’s on that CSV simultaneously without intervention of the coordinator node. This is called Direct I/O. Still every CSV volume has its own owner (or coordinator node). The owner of the disk is capable of performing metadata operations on that disk.
Because a defrag involves having exclusive access to a disk, it has to be placed into maintenance. The same applies to chkdsk.
In this picture you can see two VM’s spread across two cluster nodes, but still capable of Direct I/O against the CSV disk with its VHD.
Make sure you run these steps in a designated maintenance window because all guests will be saved and temporarily unavailable before a CSV volume is put into maintenance mode.
Step by step
Run the following commands to defrag a CSV volume:
Open a command prompt with Administrator privileges.
; Start PowerShell
; Import PowerShell module for Failover Clusters
PS > Import-Module FailoverClusters
; Request list of available CSVs
PS > Get-ClusterSharedVolume
; Request properties of selected CSV volume
PS > Get-ClusterSharedVolume “csv01” | fc *
The VMs and CSVs are still online.
; Enable maintenance mode for specified CSV which puts VMs into Save State
PS> Suspend-ClusterResource “CSV01” –VolumeName “C:ClusterStorageVolume1”
VMs are saved one by one.
; Check status of all CSV’s
PS > Get-ClusterSharedVolume
As soon as the CSV is switched into maintenance a defrag (or chkdsk) command can be given from the coordinator node. If it is not on the current node the CSV can be moved:
; Move CSV to specific cluster node
PS > Move-ClusterSharedVolume “csv01” –node hv01
Before starting the actual fragment, an analysis can be done first.
; Fragmentation analysis of a CSV
PS > Repair-ClusterSharedVolume C:ClusterStorageVolume1 -Defrag -Parameters “/A /U /V”
If the fragmentation is over a certain percentage, the actual defrag can be started
; Defrag of CSV
PS > Repair-ClusterSharedVolume C:ClusterStorageVolume1 –Defrag -Parameters “/H /U /V /X”
Here is a list of all defrag parameters. In the above example we run the operation at a higher than default priority, displaying progress, printing verbose output for additional statistics and perform a free space consolidation on the specified volume.
; Take CSV out of maintenance
PS > Resume-ClusterResource “CSV01” –VolumeName “C:ClusterStorageVolume1”
; Check status of CSVs
PS > Get-ClusterSharedVolume
; Restart all guests that were paused as a result of the maintenance
PS > Start-ClusterGroup “[name of guest cluster group]”
An interesting subject is how we can automate all this, either from System Center Operations Manager or possibly even System Center Orchestration Manager. There may be even third party tools that deal with Hyper-V R2 Cluster Shared Volumes and automate the defrag process.
If you have any experience in this area please leave a comment about how you solved this.
UPDATE: Commenter Gavin made a great suggestion to place the CSV in Redirected Access Mode. In that case the VM’s don’t need to go into Save State. You should not do this during working hours as this significantly impacts the performance of the host and its guests. See my answer in the comments which commands to use in PowerShell to switch CSV in Redirected Access Mode.