For the past couple of years my world has been consumed in SQL building the AlwaysOn High Availability and Disaster Recovery technology in SQL Server 2012.
Why is High Availability and Disaster Recovery (HA/DR) strategies required?
With a growing business comes more customers, more revenue (great!!!), and more expectations from customers for the core business to be up and running at all times. This is where DBAs focus on the High Availability and Disaster Recovery strategies to ensure that the backend DBs are always up and running.
Very common things that the DBAs will focus on are – how do I minimize downtime (RTO) and how do I minimize data loss (RPO) in case of planned (such as patching) or unplanned issues – or in other terms how do I improve my availability SLA? The DBAs have to define the availability SLA of the applications and close on the business requirements. Followed by choosing, implementing and monitoring the right HA/DR solution that meets their requirements.
With SQL Server 2012 we have built two new solutions: AlwaysOn Availability Groups, and AlwaysOn Failover Cluster instance. AlwaysOn Availability Groups is conceptually similar to database mirroring and log shipping where we (SQL Server technology) move physical logs between non-shared disks across different nodes (or machines). AlwaysOn Failover Cluster Instances is a lot of enhancements and investments that we have done to the existing SQL Server Failover Cluster Instance solutions which uses shared disk.
There is a whole heap of new features that we have built in the AlwaysOn Availability Group (AG) solution. Such as if your application depends on multiple DBs, you can add all the DBs to a single Availability Group (AG) and failover all of them as a single unit. Say you currently have only high availability with 2 nodes (or replicas in AG) at a single local site or datacenter, however your business is growing and you have a business requirement to protect against disasters to your local site. You can add additional nodes to your AG at a Disaster Recovery site using the multiple secondaries feature of AlwaysOn AG. You can have up to 4 secondaries for a primary (i.e. 5 nodes in your AG). Now say your site is being used even more than expected, and your primary server is running 2 components to your application: a read-write and a read-only reporting component. With AlwaysOn AG’s Active Secondaries feature, you can move your read-only reporting component (application) to any of the secondaries. You can also move certain operations such as backups to your secondaries. We have also done a lot of work in improving failover time, more support for online operations, support for windows server core which requires less patching, better error detection mechanisms and many other features that you can see in the video and link below to get a better idea of the solution. We have also made a lot of investments in the tools for configuring, managing and easily monitoring the AlwaysOn AG.
This has definitely been one of the fun, exciting, complex, challenging (talk about distributed systems
), large and high impact projects that I have worked on at Microsoft. I am very excited that this is en-route to be released with SQL Server 2012.