Thesis 1401
  • Jose Manuel González Chenlo
  • David Enrique Losada Carril

Exploiting multiple sources of evidence for opinion search in the Web

In this thesis we study Opinion Mining and Sentiment Analysis and propose a fine-grained analysis of the opinions conveyed in texts. Concretely, the aim of this research is to gain an understanding on how to combine different types of evidence to effectively determine on topic opinions in texts. To meet this aim, we consider content-match evidence, obtained at document and passage level, as well as different structural aspects of the text. Current Opinion Mining technology is not mature yet. As a matter of fact, people often use regular search engines, which lack evolved opinion search capabilities, to find opinions about their interests. This means that the effort of detecting what are the key relevant opinions relies on the user. The lack of widely accepted Opinion Mining technology is due to the limitations of current models, which are simplistic and perform poorly. In this thesis we study a specific set of factors that are indicative of subjectivity and relevance and we try to understand how to effectively combine them to detect opinionated documents, to extract relevant opinions and to estimate their polarity. We propose innovative methods and models able to incorporate different types of evidence and it is our intention to contribute in different areas, including those related to i) search for opinionated documents, ii) detection of subjectivity at document and passage level, and iii) estimation of polarity. An important concern that guides this research is efficiency. Some types of evidence, such as discourse structure, have only been tested with small collections from narrow domains (e.g., movie reviews). We demonstrate here that evolved linguistic features –based on discourse analysis– can potentially lead to a better understanding of how subjectivity flows in texts. And we show that this type of features can be efficiently injected into general-purpose opinion retrieval solutions that operate at large scale.
Canonical link