Information Science for Competition & Data Analysis via HADOOP

Abstract:

Structured data analysis has been a huge success in the past, however, the analysis of large-scale unstructured data in the form of a difficult area. Indeed, YOUTUBE has trillions of users, and every day these users watch hundreds of millions of hours on YOUTUBE and generate trillions of views [1], another statistic adds that every day and across the whole world more than 1.2 million videos on YOUTUBE are downloaded, and to analyze and understand the activity that occurs on such a massive scale, a relational database is no longer sufficient, you need a distributed and parallel system like HADOOP. The main objective of this paper is to focus on how data is generated on YOUTUBE and how it can be used by different companies to target more users and increase their purchases. Our project uses the YOUTUBE data API. Once the API key has been generated, a python script will be used to use this API to retrieve video information according to the search criteria. In this paper we use SQL type queries which will be executed in HIVE to extract the significant output which can be used for the analysis. 

nsdlogo2016