A Bottom-Up Generic Probabilistic Building and Enriching approach for knowledge graph using the LDA-based clustering method

Abstract:

The world's knowledge appears to be limitless, making it appear impossible to ever fully comprehend it. Therefore, methods that autonomously infer new knowledge and enrich the Knowledge Graph (KG) are of particular interest. However, the majority of existing works related to distributional graph clustering of terms use “the top-down principle” for enriching KG. They adopt as input data a corpus that contains documents belonging to a single domain and a predefined personalized KG model related to this corpus. On this basis, they enrich it semi-automatically or manually from different data sources using multiple techniques. We introduce a novel semi-supervised framework, called BUGPBE-LDA, which is an LDA adaptation. It builds and enriches a Generic Core KG (GCKG) with its Core Concepts (CC) using advanced machine-learning techniques. We will conduct this experimental approach by following four principal workflows with on focus on extracting the “NP is-a NP” pattern. Our textual corpora are a collection of vocabulary including terms and classes that covers the fish hunting and ontology domains. As a result, our contribution can enhance KG with probabilistic weights and semantic enrichment. We evaluate our proposal on two textual corpora ONTO and HUNT and compare it to the LDA unsupervised clustering baseline. The F-measure results show that our model outperformed its competitor.