Declarative generation of RDF-star graphs from heterogeneous data
RDF-star has been proposed as an extension of RDF to make statements about statements. Libraries and graph stores have started adopting RDF-star, but the generation of RDF-star data remains largely unexplored. To allow generating RDF-star from heterogeneous data, RML-star was proposed as an extension of RML. However, no system has been developed so far that implements the RML-star specification. In this work, we present Morph-KGC^star , which extends the Morph-KGC materialization engine to generate RDF-star datasets. We validate Morph-KGC^star by running test cases derived from the N-Triples-star syntax tests and we apply it to two real-world use cases from the biomedical and open science domains. We compare the performance of our approach against other RDF-star generation methods (SPARQL-Anything), showing that Morph-KGC^star scales better for large input datasets, but it is slower when processing multiple smaller files.
keywords: Knowledge Graphs, RDF-star, RML-star, Data Integration
Publication: Article
1707817377003
February 13, 2024
/research/publications/declarative-generation-of-rdf-star-graphs-from-heterogeneous-data
RDF-star has been proposed as an extension of RDF to make statements about statements. Libraries and graph stores have started adopting RDF-star, but the generation of RDF-star data remains largely unexplored. To allow generating RDF-star from heterogeneous data, RML-star was proposed as an extension of RML. However, no system has been developed so far that implements the RML-star specification. In this work, we present Morph-KGC^star , which extends the Morph-KGC materialization engine to generate RDF-star datasets. We validate Morph-KGC^star by running test cases derived from the N-Triples-star syntax tests and we apply it to two real-world use cases from the biomedical and open science domains. We compare the performance of our approach against other RDF-star generation methods (SPARQL-Anything), showing that Morph-KGC^star scales better for large input datasets, but it is slower when processing multiple smaller files. - Julián Arenas-Guerrero, Ana Iglesias-Molina, David Chaves-Fraga, Daniel Garijo, Oscar Corcho and Anastasia Dimou - 10.3233/SW-243602
publications_en