Integración de recursos grid y cloud para anotar semánticamente grandes colecciones de objetos de aprendizaje

From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. Many existing approaches have only been applied on small data collections and its validity has not been demonstrated for more large and complex ones. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. Using an annotation algorithm previously developed by the authors, an initial study shows that more than 1600 CPU-years were need to annotate nearly 16 million resources that make up the target repository of this work. The combination of parallel programming techniques and the use of distributed and heterogeneous computing infrastructures (grid, cluster, cloud, etc.) to execute the annotation process has allowed to solve the previous problem in 178 days. This shows the usefulness of this kind of infrastructure and the advantages of its computation models to address open problems in the fields of Linked-data and semantic.

keywords: Semantic annotation, Grid and cloud computing, Computing resources integration, DBpedia, Linked-data