top of page

5V's that make data a BIG DATA

  • Writer: Vishakh Rameshan
    Vishakh Rameshan
  • Jan 16, 2021
  • 2 min read

Internet has been with us for a long time and the number of websites, applications, products have tremendously grown with new inventions bringing more products to market. The days have gone where we used manual methodologies to perform our daily needs and instead people rely more and more on technology. IOT device is one such example.


With so much devices, tools, applications, websites being launched, the data which we used to store in KBs in a floppy disk, which soon replaced CD's and DVD's of MB and GBs where long gone with HDD of TB's, still the storage devices are not enough for our need.


The industry has shifted its gear from a banking or health or retail industry to a data driven industry creating insights from the huge data accumulated with course of time. Companies like Facebook, Google, Twitter etc run on huge amount of user data to generate meaningful business from it. The driving force behind all is the data.


But, having theses data stored on a large platform of hardwares is worthless if we are not able to get any meaningful insights from those. To parse these Big Data frameworks like Hadoop, Spark, Pig etc came in. But we didn't stop there, Artificial Intelligence and Machine Learning has taken us to a next level of technology where data is processed in micro seconds to generate result so accurate that a human would take years to find a solution.


Now a days people talk a lot about Big Data, Data processing, Data scientists, Data Analytics etc, but what actually makes a data so Big, is it just the Size?


No, here comes the 5V's that make or decide whether to call the data that you hold to be a Big Data.


Initially we had only 4 V's. The Value has been recently introduced. Let's talk about each one of them.


  1. Velocity - is the means of how data gets accumulated on a hardware. We have application which run on real time like a banking transactions or a batch processing like dumping of logs or via streaming. Irrespective of the source from where data originates, how it lands on a target system is what velocity means.

  2. Value - The data getting accumulated need to have some valuable information like say the data dumps can be medical records, bank transaction, database logs, user activity etc.

  3. Veracity - with huge data being dumped there can be situation of loosing its authenticity or value or quality. So having means to differentiate different categories of data is important.

  4. Variety - 20 years back we only had SQL transaction database where data is strictly structured (like database tables, CSV file), but now as you know we have unstructured data (like audio, video) and semi structured data (like logs, forms, json files etc) that is estimated to grow more than the structured ones.

  5. Volume - this is pretty straight-forward as everyone knows, when data is huge in quantity only then it can be big data.


Komentar


bottom of page