Performing Text Analytics –Human vs. Computer

Abstract:

Even though new data sources and types have emerged, text-based data is still considered to be an important foundation of analysis projects. While text documents deliver important information, they are, at the same time, challenging to analyze due to their unstructured nature. This paper describes methods and algorithms on how to extract information from text-based data sources. More importantly, this research tries to identify some of the issues in applying common text analytics techniques in real-world data from multiple data sources. An experiment was created to identify critical success factors in big data projects from published case studies through content analysis. The content analysis was completed by human manually and a Hadoop computer program, and the identified critical success factors from both methods are compared and analyzed. This study shows that automated text analysis could accurately extract high level critical success factors, and struggling with low-level critical success factors as well as abstract factor like strategy.