Stories of widescale website outages caused by cloud datacenter failures have been in the news this past month. These outages have led many to lament cloud computing and ponder bringing services back in-house, as if running the websites on their own servers would somehow protect against having just one point of failure. These concerns are not related to a particular cloud provider (AWS and Azure had large failures caused by human error within weeks of each other) or cloud computing in general. In fact, running your website in the cloud can make mitigation of an entire datacenter outage trivial. Similar configurations are possible in other cloud providers, but let’s look at how to solve this problem with Azure Traffic Manager.
To set up this demo, we will first deploy a sample website to the East US Azure datacenter and edit it to show that the page is served from the Primary site. Next we’ll deploy a second copy of the site to the West US Azure datacenter and again edit the site to show it is being served from the Secondary site. We now have a primary site and a secondary site in another datacenter we can use if the primary goes down. The question now is how to route traffic correctly to the two sites.
Enter Azure Traffic Manager
To accomplish this in Azure, we will set up a Traffic Manager profile to route traffic between the two sites. Azure Traffic Manager is DNS routing with a fair amount of configurable smarts available to it. First, we’ll add the Traffic Manager profile and set it to use the Priority routing method, which attempts to connect to each endpoint in the order they’re configured, failing over only when there is an outage. Other configurations are available to route by factors such as weighting, latency, or geographic location.
With the Traffic Manager profile created, we need to add endpoints for the Traffic Manager URL to point to. Let’s add an endpoint for each site, giving the Primary site first priority.
We can now browse to the URL configured in the Traffic Manager profile and see that the site is being returned from the Primary site.
To test the failover to the Secondary site in the West US datacenter, we shut down the Primary App Service in Azure. When trying to refresh the page, we now see that the website is stopped.
Our Traffic Manager profile is configured to probe the health of each node using the default root URL. After waiting a short amount of time for the health probe to notice the site is down and for the DNS entry to age out of our cache (configurable in the Traffic Manager profile), we refresh again and now see that the site is being served from the Secondary site.
No manual failover is required. Once the health probe detects a site is down, DNS queries will automatically resolve to the next site on the list. When the site comes back up, Traffic Manager will begin routing traffic to the primary site once its health probes start succeeding.
Traffic Manager is one small piece of the puzzle to keeping your sites and services available for users in a disaster situation. If you’d like Nebbia Technology’s help in identifying and remedying possible deficiencies in your DR setup, contact us at firstname.lastname@example.org.