Abstract:
The one of most important factors that constitute the current reality are the overload of information, therefore the ability to manage information becomes crucial. Most of the information that is used in modern businesses and institutions is stored in a text form. The use of methods of information retrieval, information extraction, text exploration or natural language processing allows to build a knowledge base and to systematize it, which influences its effective management. This paper is a theoretical analysis and empirical verification of the usefulness of using the keyword identification method based on Dirichlet Latent Allocation (LDA) in scientific texts written in Polish. A collection of text documents containing summaries of scientific articles on the use of statistical methods in science, management, and logistics was used in the study. The analysis of the given set of text documents was additionally divided into the following subtopics: summaries of scientific articles in their entirety and abstracts of scientific articles divided into sentences. The text data collection was analyzed using R. Performance and was measured by the degree of compatibility between the sets of keywords identified by humans and those detected by the algorithm. The Jaccard index was used as a measure. The results obtained from the analysis of the method tested are satisfactory.