This post is not talking about recovering from a failed data center or Azure Site recovery procedures. No, it’s just “simple” recovery when everything went down for whatever kind of reason. I mean, we are humans and we make mistakes. Even in an environment as Azure they make mistakes. Remember the Storage outage last year? This was by a human making a mistake pushing an update. Sorry… shit happens.
But what if something happened and the power, storage, networking and suddenly the VM’s comes back up and you log in to Azure Pack and an error is on your screen? When we run projects at customers we build from the ground up and “assume” its running forever. In this post I want to discuss the proper way of bring an environment back up and running and what are the critical pointers you need to be aware of.
So let’s start the environment J
START DOMAIN CONTROLLERS
First we need to bring up all the domain controllers for each domain where fabric and azure pack components are running. If an ADFS instance is co-located on the domain controller and is using SQL, check after booting the SQL if the services are started correctly.
START HYPER-V SERVERS
If Hyper-V servers went down start them first.
The first 2 depend on how you have built your environment. You know about chicken and egg story? When you have one physical domain controller start this first. If you have only virtual domain controllers then first start Hyper-V Server. Just be careful when you have virtual domain controllers and your hyper-v server is joined to the domain that host the domain controller, that it can start the vm when booting the first hyper-v node. You would not be the first where I have seen that the cluster couldn’t start because domain was not up and domain couldn’t be brought up because storage and/or cluster couldn’t start. Read More »