top of page

Know what CAP Theorem is

  • Writer: Vishakh Rameshan
    Vishakh Rameshan
  • Jan 5, 2021
  • 3 min read

We live in a wold where technology changes so drastically that we forget to look whats underneath that causing it to evolve so quickly. The applications (web/mobile) which used to be present 10 years back are no longer available and the end users no longer wait for an app to load or fulfill the request, instead people choose to move away from the app to a different one as tons of them can replace it.


The expectation of users has changed. The perception of a website, tool or mobile app, how it should be and how it should run has also greatly changed. We want everything to be available 24*7. The Revolutionary change to the technology industry was with the introduction of Virtual Machines and virtualization concept, and then came the containerization and container technologies.


With the Cloud Computing in front of us, business have started to consume those due to the increase in number of consumers and their expectations. Spinning up VMs and deleting them became a button click. Databases which were used to be standalone servers are now distributed. Tools and frameworks like Apache Hadoop Ecosystem, Kafka, Cassandra etc for processing big data are all distributed in nature.


And so there comes the CAP theorem

CAP theorem states that it is impossible for a distributed system to simultaneously provide more than two out of the following three guarantees:

  • Consistency: Every read receives the most recent write or an error

  • Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write

  • Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Lets dive deep into all 3 to see what does those terms mean


Consistency


The term consistency means the data retrieved from a data store is always the latest one. This can be into 2 flavors - Strong Consistency and Eventual Consistency.


Strong Consistency - means that the data is immediately persisted or updated to all the distributed nodes (vm, or even disk) whether they are on a same rack or on a different availability zone or on different region. The data then fetched will be always the latest one.


Eventual Consistency - means that the data takes time to get persisted on all the distributed nodes and if there is a retrieval done during this time, then the last updated data (point in time) will be returned.


So if you take the case of RDBMS, the data is strongly consistent where as if you consider a HDFS cluster or a Cassandra DB, the data is eventually consistent.


Availability


The request serving applications or services provided (whether it's a SAAS or PAAS or any distributed system) should be highly available to server the request.



Means that in a distributed system where application/data runs on multiple nodes, when few of the node crashes, still the entire cluster must be available to serve traffic/request.


This can be achieved only with Horizontal scaling and replication.



Horizontal scaling - means that if the request is too high and the existing number of nodes are unable to handle, then new nodes must be spun up. If few nodes crashes, new nodes should pop up immediately. Here all the nodes configuration (RAM, vCPU etc.) would be identical.


Vertical Scaling - this you can map with the existing on premise database/application servers which runs on a single machine and incase of request/load increases the CPU or RAM capacity is increased on that machine.


Replication - which is a feature available in almost all the modern distributed systems where data is replicated to atleast 3 nodes (configurable), to make it highly available in case if 2 nodes fail, data is still available on the 3rd node.


There is a trade off to these distributed system as it is not possible for any distributed system to achieve all 3 to their full potential. Instead most of them tries to achieve 2 out of 3 and 3rd one to some extend.


Comments


bottom of page