Before we
know What is Big data, let’s start with Why Big data came into the picture?
Big data
gets generated in multi petabyte quantities every day. Data changes fast and
comes in different format e.g. audio, video, picture, text, structure, unstructured
etc. those are difficult to manage and process using RDBMS or other traditional
technologies. Since tech company like Google, yahoo in early 2000s found
challenges to solve these various types of data with huge volume with existing
technologies, so they started looking into alternative solution and that's how Big
data is here today. You will find more about the big data history at the end of
this post.
Let’s start
with, what is Big data?
Is big data
a Tool? Language? Solution? Or what? ...
Well, it’s
a platform that comprises many tools, fortunately most of them are open source.
However, since there are many tools available in the market to solve big data
challenges, so next confusion arises; what tools to use when, I will write about
this in my next post.
Let’s focus
on concept of big data, People think big data is always about huge data, but it’s
not the case. We can say, to be candidate for big data solution it
should meet at least one of the three elements from 3 Vs:
1) Volume
2) Velocity and
3) Variety
Fig 1: Elements to meet big data challenge |
High
volume: Social media like Facebook has billions of users, huge content created
on YouTube every hour, organization like NASA generated 1.73 gigabytes of data
at the end of year 2014 in every few seconds, Maersk vessels send huge volume of
data every minutes over network.
High
Velocity: Speed of the data matter, you need to capture real time data from IoT
devices. Your mobile devices produce tons of data every day. Some business can’t wait longer, so you may have to capture near real
time of data and need to process immediately. Some business like retail industry require real time data.
High Variety:
Different type of data mixed in the same platform e.g. Wikipedia or Twitter or
Facebook they have mix of text, audio, videos, images etc. Regular business also receive different format of data which need to transform into useful
output.
So when your organization deal with the above 3 Vs then it's time to consider moving into big data platform. As Forbes research shown [1], the companies who said don't have any plan to use big data in 2015, out of those; 11% percent already started using big data from 2017. And in 2015, 17% mentioned they are using big data but those number is increased to 53%. in 2017. The research also added that, among all industries; Finance and Telecom are ahead to adapt the big data.
History of
Big data (literally how Hadoop invented):
Since data started
growing exponentially and you get various type of data with great velocity which existing
transactional database could not handle. Hence, many says; at first Google
faced challenge how to handle the scenarios where they tried to gain an advantage in their searches, Google wanted to create a catalog
of the entire Internet. To be successful,
they had to meet the challenges presented by the 3 V's (as mentioned above) in an innovative way.
Google tackled the big data problem working together in a group of interconnected,
inexpensive computers. This was revolutionary, over a span of a couple of
years, Google Labs released papers describing the parts of their big data solution. From these, Doug Cutting and Mike Cafarella
began developing a project at Yahoo!, which
was later open sourced into the Apache Foundation project called Hadoop, named after the toy elephant of Mr.
Cutting’s son.
When people
talk about big data, the first name come is ‘Hadoop’. Hadoop is High-availability
distributed object-oriented platform is used in maintaining, scaling,
error handling, self-healing and securing large scale of data. These data can
be structured or unstructured. As mentioned earlier if data is huge with variety and need to process instantly then traditional systems are unable to handle it. Thus, Hadoop comes in the picture.
But please remember,
big data is not only Hadoop, there are so many other tools work with Hadoop eco
system which you must need to use to solve the big data challenges which I am going to write in my next post.