Automatic design of a Proper Noun Ontology for a Question-Answering System

This project consists in building an ontology of proper nouns from unrestricted news text so as to improve performance on a question answering task for Portuguese and Spanish journals. The training corpus consists of two years of news from, on the one hand, La Voz de Galicia and El País (Spanish journals), and on the other, O Público (Portuguese journal). The ontology on Named Entities is constituted by proper nouns referring to places, public persons, and organizations.

Objectives

  • Building Spanish and Portuguese text corpora from news articles.
  • PoS tagging and syntactic analysis.
  • Named Entity Recognition and Classification
  • Clustering of Named Entities
  • Design of an OWL ontology of Named Entities.