What is Big Data

Blake Tolman
3 min readApr 16, 2021

One popular interpretation of big data refers to extremely large data sets. A National Institute of Standards and Technology defined big data as consisting of “extensive datasets — primarily in the characteristics of volume, velocity, and/or variability — that require a scalable architecture for efficient storage, manipulation, and analysis.” Some have defined big data as an amount of data that exceeds a petabyte — one million gigabytes.

The “Three Vs” of Big Data

In 2001, industry analyst Doug Laney defined the “Three Vs” of big data:

  1. Volume
    The unprecedented explosion of data means that the digital universe will reach 180 zettabytes (180 followed by 21 zeroes) by 2025. Today, the challenge with data volume is not so much storage as it is how to identify relevant data within gigantic data sets and make good use of it.
  2. Velocity
    Data is generated at an ever-accelerating pace. Every minute, Google receives 3.8 million search queries. Email users send 156 million messages. Facebook users upload 243,000 photos. The challenge for data scientists is to find ways to collect, process, and make use of huge amounts of data as it comes in.
  3. Variety
    Data comes in different forms. Structured data is that which can be organized neatly within the columns of a database. This type of data is relatively easy to enter, store, query, and analyze. Unstructured data is more difficult to sort and extract value from. Examples of unstructured data include emails, social media posts, word-processing documents; audio, video and photo files; web pages, and more.

Types Of Big Data

Following are the types of Big Data:

  1. Structured
  2. Unstructured
  3. Semi-structured

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.

Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don’t know how to derive value out of it since this data is in its raw form or unstructured format.

Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.

Summary

  • Big Data definition : Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.
  • Big Data analytics examples includes stock exchanges, social media sites, jet engines, etc.
  • Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
  • Volume, Variety, Velocity, and Variability are few Big Data characteristics
  • Improved customer service, better operational efficiency, Better Decision Making are few advantages of Bigdata

--

--