Where to Start Filtering Redundancy? A Cluster-Based Approach
Novelty detection is a difficult task, particularly at sentence level. Most of the approaches proposed in the past consist of re-ordering all sentences following their novelty scores. However, this re-ordering has usually little value. In fact, a naive baseline with no novelty detection capabilities yields often better performance than any state-of-the-art novelty detection mechanism. We argue here that this is because current methods initiate too early the novelty detection process. When few sentences have been seen, it is unlikely that the user is negatively affected by redundancy. Therefore, re-ordering the first sentences may be harmful in terms of performance. We propose here a query-dependent methodbased on cluster analysis to determine where we must start filtering redundancy.
keywords:
Publication: Congress
1624015003958
June 18, 2021
/research/publications/where-to-start-filtering-redundancy-a-cluster-based-approach
Novelty detection is a difficult task, particularly at sentence level. Most of the approaches proposed in the past consist of re-ordering all sentences following their novelty scores. However, this re-ordering has usually little value. In fact, a naive baseline with no novelty detection capabilities yields often better performance than any state-of-the-art novelty detection mechanism. We argue here that this is because current methods initiate too early the novelty detection process. When few sentences have been seen, it is unlikely that the user is negatively affected by redundancy. Therefore, re-ordering the first sentences may be harmful in terms of performance. We propose here a query-dependent methodbased on cluster analysis to determine where we must start filtering redundancy. - Ronald T. Fernández; Javier Parapar; David E. Losada; Álvaro Barreiro - 10.1145/1835449.1835590
publications_en