Normalization-Driven Optimization of Knowledge Graph Creation for Semantically Grounded Information Fusion

The growing demand for reliable decision-making systems has made Knowledge Graphs (KGs) a cornerstone of information fusion. As data structures that can unify heterogeneous information sources through semantically enriched representations, KGs facilitate interoperability and contextual integration across domains. However, constructing such KGs remains challenging due to the high resource demands in terms of memory, execution time, and data-mapping complexity. This work addresses the challenge of creating scalable KGs by introducing optimization techniques that minimize memory usage and execution time through data partitioning and dependency-aware integration planning. The proposed approach leverages functional dependencies (FDs) to streamline input mappings and data sources, ensuring that only essential attributes are included in each transformation step while independently processing non-dependent attributes. To evaluate the effectiveness of our approach, we conducted an extensive empirical study using state-of-the-art KG creation engines and benchmarking frameworks. A total of 552 experimental configurations were executed. Our results demonstrate memory consumption reductions of up to a factor of 1221.71 and execution time improvements by a factor of 1112.97. These findings underscore the importance of data source characteristics, such as functional dependencies, for optimizing KG creation performance and highlight the potential of FD-based planning to enhance existing KG engines.

Palabras clave: Data Integration Systems, RDF Mapping Languages, Functional Dependencies, Normalization Theory