Today's business always try to deliver the best product to the market and few of them sustain for long and even become popular only when their business or application (whether its a website, mobile apps or any online portal) are highly responsive, always available to meet demands and are scalable to handle high amount traffic and even robust to sustain damage both on hardware and software side.
This is true not only for an application workload running in production but even for a databases or clusters of servers performing big data processing.
Having the business to support all fancy words like Highly Available, Robust, Elastic, Fault Tolerant, Flexible etc need proper planning on how the application should be deployed, where it should be deployed and what services to use especially when we think of hosting our application on Cloud Platforms like AWS, GCP etc.
Deciding on whether to go with an Active-Active or Active-Passive determines what cloud service to use and how to properly architect the application to utilize the features that the cloud service provides. And for that understanding what the Active-Active and Active-Passive means is utmost important.
So, let's go through each one of them and I will try to explain it as simple as possible for better reach to a wider audience.
Active-Active Architecture
This architectural decision depends on whether the application/business needs to be highly available with very minimal downtime (in seconds).
Let's talk about this in Cloud perspective as most business do not have the flexibility to support in their own data center.
So having application deployed in say multiple Availability Zone in a region gives advantage of handling requests/traffic even when an availability zone goes down due to some hardware issue or disaster. When the application have users across multiple regions, having the workload spread or duplicated to another region gives even more advantage incase if a region itself goes down.
Active-Active makes our application running in 2 or more regions highly available and fault tolerant.
But, like CAP theorem states that, it is not possible or highly difficult to achieve all 3 (Strong Consistency, High Availability and Partition Tolerance). We have to loose one to gain other two.
Having High Availability and Partitions Tolerance when application is deployed in 2 separate regions gives eventual consistency to the data.
Use Case:-
When you have web application with users only from a region or across multiple regions
Real Time workloads with quick responsiveness
Real world examples like Pokemon Go, Pubg etc
Advantage:-
High Availability
Highly responsive
No worries during Disaster or Regional Outage
Disadvantage:-
Application architecture must support this
Increase in Cost
Maintenance
Only very few 3rd party services and cloud services support multi regional or global availability
Active-Passive Architecture
This architectural design has some trade offs for high availability and quick response. When Applications is deployed in a zone or region that stays active and handles all the requests coming in. But in case there is an zonal or regional outage, some of the traffic/request are lost and there will be some downtime while switching to passive region and making it active for further processing of request.
Use Case:-
Suitable Batch applications
Mostly opted for DR (Disaster Recovery) planning
No critical user requests
Applications with customer from only specific region
Advantage:-
Cost is minimal as passive mode is used only when needed
Maintenance is low
Most 3rd part services and cloud services support this
Disadvantage:-
Need to plan ahead and perform proper trial runs before switching from active to passive
Proper understanding of how system behaves during a downtime is critical
Proper SLA's and SLO's must be defined
Conclusion
Both Active-Active and Active-Passive have there own advantages and disadvantages. Based on what business demands, the choice must be carefully made. Once decided and architectured properly, switching from Active-Active to Active-Passive won't be an easy task.
When going to cloud or opting for services, even if they provide or state 99.99% SLA's and SLO's, for critical business, proper chaos testing must be performed to determine if the active-active or active-passive works as expected as we cannot test it during an actual disaster or outage.
コメント