Retrieval of relevant and novel sentences using information retrieval models and techniques

The purpose of this project is to improve the effectiveness of retrieval systems and non-redundant sentences. This task in the field of Information Retrieval (IR) goes beyond the basic work of document retrieval. Once a user makes a query retrieves an array of documents which ranking has to be processed for identifying the sentences (or phrases) relevant to the query and avoiding repetition. In addition, the research results at international level so far clearly demonstrate the need for increased research efforts, both to detect relevant sentences as the novice.

In the context of this research project will address approximations based on Language Models, fuzzy quantification and dimensionality reduction to solve the recovery problem of statements and redundancy. It is hoped that the variety of approaches considered to improve the effectiveness of this task and to foster cross-fertilization among different research addressed.

Objectives

The main objective of the project is to study in depth the task of recovery and redundancy sentences, increasing the state of knowledge that currently exists and identifying ways to improve outputs, both in retrieving relevant sentences as in filtering the subsequent redundancy. This objective can be broken down into several sub-goals:

  • Determine whether Statistical Language Models (SLM) can bring benefits to retrieve relevant sentences. The successful application of language models for other tasks of IR is a major asset for obtaining results potentially relevant to the field in the task of novelty.
  • Determine whether the Language Models can bring benefits to recover novel sentences. This subgoal is backed by shortages in results of Statistical Language Models to determine redundancy.
  • Determine whether fuzzy quantification can improve the accuracy in retrieving relevant sentences.
  • Determine whether fuzzy quantification can be useful in detecting redundant sentences. Both objectives (O3 and O4) are novel because neither has done any research in fuzzy quantification for recovery problem statements and redundancy. They are promising targets because fuzzy quantification has recently revealed promising as pairing mechanism for IR.
  • Determine the efficacy of LSI based methods for the detection of relevant and non-redundant sentences. The aim is innovative and has not been applied LSI to this task and relevant for the results obtained by LSI construction of abstracts.