Evaluating Information Retrieval systems is crucial to making progress in search technologies.
Evaluation is often based on assembling reference collections consisting of documents,
queries and relevance judgments done by humans.
In large-scale environments, exhaustively judging
relevance becomes infeasible.
Instead, only a pool of documents is judged for relevance.
By selectively choosing documents from the pool we can optimize the number of judgments required to identify a given number of relevant documents.
We argue that this iterative selection process can be naturally modeled as a reinforcement learning problem and
propose innovative and formal adjudication methods based on multi-armed bandits. Casting document judging as a multi-armed bandit problem
is not only theoretically appealing, but also leads to highly effective adjudication methods.
Under this bandit allocation framework, we consider stationary and non-stationary models and propose seven new document adjudication methods (five stationary methods and two non-stationary variants). Our paper also reports a series of experiments performed to thoroughly compare our new methods against current adjudication methods. This comparative study includes existing methods
designed for pooling-based evaluation and existing methods designed for metasearch. Our experiments show that our theoretically grounded adjudication methods can substantially minimize the assessment effort.
Keywords: Information Retrieval, Evaluation, Pooling, Reinforcement Learning, Multi-armed Bandits