Research Article | Open Access
A Systematic Literature Review of Big Data and the Hadoop frameworks
Devishree Naidu, Adi Thakur
Pages: 2969-2973
Abstract
Big data is a term to define the huge amount of data gathered mostly through new data sources like Twitter,
Instagram, Facebook etc. This data is important as its analysis is changing how major businesses work and has the
ability to provide the knowledge required to cut back business costs. Most firms are currently using this technology
to accurately find trends and predict future events in their various industries. The challenge lies in finding the best
way to process, analyze and draw useful insights from this data. This data cannot be handled efficiently by the
traditional data management tools and hence required some other advanced data technologies. This is mainly
because of its unstructured nature and the five V’s – Volume, Variety, Velocity, Value, and Veracity which we
mostly use to define big data are the main reason why its handling is a major challenge. Since this data is growing
at an exponential rate, it was a necessity a develop technologies to address it. Hadoop, Map Reduce, and No SQL
are the major three technologies that were developed to handle the complexities of big data and manage it reliably.
This paper discusses the several technologies based on Hadoop which is altogether called the Hadoop ecosystem
and their uses in analyzing big data
Keywords
Big data, Flume, Map Reduce, Hadoop Ecosystem, Hadoop frameworks