This is more generic post rather than technical. I usually get lot of questions in early phase on how we are going to place compute instances in different Availability Domains (AD’s) for high availability purposes.
But first, what is an Availability Domain?
Region can consist of one or more Availability Domains which are physically separated, have their own network and don’t share any other infrastructure they are very unlikely to fail at the same time. AD’s inside Region are also connected via low latency network.
I’ll approach this topic in this post from OCI perspective, not looking on how you would actually build your application. Typical question on those regions where there are multiple AD’s regions is that should we put our application servers in different AD’s? I usually recommend using only a single AD, why?
Let’s take a look.
If you would put your application servers on multiple AD’s your design might look like this:
You have OCI Load Balancer which takes care of HA by itself on multiple AD’s, you have multiple application servers on different AD’s and finally you have Oracle RAC database which is supported inside single AD only.
So what would happen if AD1 would go down? Load Balancer would switch on using AD2, we could potentially scale AD2 and/or AD3 with more application servers but our RAC database would still be missing.
What if AD2 or AD3 would go down? What would be benefit of running application servers on those AD’s when you are limited with database on AD1? In addition you might be introducing additional latency for application from AD2 connecting to database on AD1 (although this might be a non-issue for many applications but it’s definitely something which should be evaluated). That’s why with these questions, I usually recommend sticking in one AD only.
If you want to add High Availability inside the region you might want to introduce standby instance with Oracle Data Guard to another AD.
Optionally you could have separate Load Balancer and separate set of application servers on AD2 which would be similar to multi-region approach I have on the last picture of this post. But would that be more of a Disaster Recovery setup?
You would also need to think database connectivity from application and how that will be handled in case of failure on AD1. Will AD2 application servers be enabled on OCI Load Balancer only when there’d be a failure?
So let’s take a look on AD1 going down, we have Load Balancer switching to AD2, application servers on AD2 and our old standby would become primary in AD2. Way better compared to our first option!
I want to highlight that even though all necessary servers/services would be available on below case, we would still need to think DNS, application connectivity etc. All pieces what make an application to work.
What about single AD-regions?
A lot of Oracle’s Regions are just single AD Regions so what should you do with those? If you really want to have high availability solution you can think above design but just do it cross-region. You need to think DNS how that get’s reassigned in case you need to use failover to another region.
But again, does it make it a Disaster Recovery or a HA solution?
Remember that inside each AD you have three Fault Domains, RAC nodes get placed on separated FD’s by default and you can make sure application servers are running on different FD’s. This way in case of backend hardware failure not all of your servers are impacted and the application is highly available inside the Region!
Fault Domains don’t share the same backend infrastructure so they use different physical servers, switches etc.
Summary
This is just a starting point with design when you think on overall OCI side setup, there are obviously lot of variables to each solution but this is one of the principles when dealing with Regions, AD’s and FD’s in OCI.
It’s more common to see single-AD setup with cross-region Disaster Recovery, inside a Region, a lot of components are already designed as highly available.
I also skipped lot of areas such as DNS, application failover, database connection, performance and so on. These are all things which need to be considered and evaluated each time when you make decisions with your design.
Remember also Availability Domains are different for tenancies, so my AD1 could be different from your AD1. It might be also worth separating DEV to AD1 and PROD to AD2 for example and not run everything on same AD.
Really awesome post! Thanks for sharing this Simo.
Thank you, Simo for putting this together!