Declarative Generation of RDF Collections and Containers from Heterogeneous Data
Purpose: This paper addresses the lack of practical support for generating RDF Collections and Containers from heterogeneous sources in existing mapping tools with the RDF Mapping Language (RML). While the RML Collections
and Container (CC) module defines their generation, RML-CC implementations remain limited to a reference implementation called BURP that was not conceived for efficiency or scalability. We aim to close this gap by extending a tool for efficient RML generation with support for RML-CC.
Methodology: We extended Morph-KGC to support RML-CC and developed YARRRML-CC for user-friendly mapping definitions. We also updated Yatter to enable translation from YARRRML-CC to RML-CC. We validated these tools using 35 RML-CC test cases and 22 additional YARRRML-CC cases and conducted performance evaluations using synthetic datasets.
Findings: While BURP passes all RML-CC test cases and scales up to 1M records, Morph-KGC passes only 51% due to architectural constraints and struggles with larger datasetsMorph-KGC’s reliance on Pandas limits support for complex constructs such as nested collections. Yatter fully supports YARRRML-CC. Morph-KGC performs well in standard RDF generation but struggles with RML-CC at scale. This highlights the importance of selecting tools that align with the structural complexity and performance demands of specific use cases.
Value: Our work enhances the practical applicability of RML-CC in knowledge graph construction by providing different tooling supports (BURP for the Java ecosystem, and Morph-KGC for complete Python pipelines), interoperability
through YARRRML-CC, and validated performance insights.
Palabras clave: RML, Knowledge Graph Construction, Collections and Containers
Publicación: Congreso
1753702093940
28 de julio de 2025
/research/publications/declarative-generation-of-rdf-collections-and-containers-from-heterogeneous-data
Purpose: This paper addresses the lack of practical support for generating RDF Collections and Containers from heterogeneous sources in existing mapping tools with the RDF Mapping Language (RML). While the RML Collections
and Container (CC) module defines their generation, RML-CC implementations remain limited to a reference implementation called BURP that was not conceived for efficiency or scalability. We aim to close this gap by extending a tool for efficient RML generation with support for RML-CC.
Methodology: We extended Morph-KGC to support RML-CC and developed YARRRML-CC for user-friendly mapping definitions. We also updated Yatter to enable translation from YARRRML-CC to RML-CC. We validated these tools using 35 RML-CC test cases and 22 additional YARRRML-CC cases and conducted performance evaluations using synthetic datasets.
Findings: While BURP passes all RML-CC test cases and scales up to 1M records, Morph-KGC passes only 51% due to architectural constraints and struggles with larger datasetsMorph-KGC’s reliance on Pandas limits support for complex constructs such as nested collections. Yatter fully supports YARRRML-CC. Morph-KGC performs well in standard RDF generation but struggles with RML-CC at scale. This highlights the importance of selecting tools that align with the structural complexity and performance demands of specific use cases.
Value: Our work enhances the practical applicability of RML-CC in knowledge graph construction by providing different tooling supports (BURP for the Java ecosystem, and Morph-KGC for complete Python pipelines), interoperability
through YARRRML-CC, and validated performance insights. - Christophe Debruyne, Souail Jaadari, David Chaves-Fraga
publications_es