Integration of Grid, cluster and cloud resources to semantically annotate a large-sized repository of learning objects

The Universia repository is composed of more than 15 million of educational resources. The lack of metadata describing these resources complicates their classification, search and recovery. To overcome this drawback, it was decided to semantically annotate the available educational resources using the ADEGA algorithm. For this objective, we selected the DBpedia, a cross-domain linked data composed of more than 3.77 million ‘things’ with 400 million ‘facts’, in order to make sure that the wide range of Universia topics are covered by the ontology. However, this kind of process is extremely expensive from a computational point of view: more than 1600 years of CPU time was estimated to achieve it. In this paper, parallel programming techniques and distributed computing paradigms are combined in order to achieve this semantic annotation in a reasonable time. The cornerstone of this proposal is a resource management and execution framework able to integrate heterogeneous computing resources at our disposal (grid, cluster and cloud resources). As a result, the problem was solved in less than 180 days, demonstrating that it is perfectly feasible to exploit the advantages of these computing models in the field of linked data.

keywords: