In this paper we describe our recent research on effective construction
of Information Retrieval collections. Relevance assessments
are a core component of test collections, but they are expensive to
produce. For each test query, only a sample of documents in the
corpus can be assessed for relevance. We discuss here a class of
document adjudication methods that iteratively choose documents
based on reinforcement learning. Given a pool of candidate documents
supplied by multiple retrieval systems, the production of
relevance assessments is modeled as a multi-armed bandit problem.
These bandit-based algorithms identify relevant documents with
minimal effort. One instance of these models has been adopted by
NIST to build the test collection of the TREC 2017 common core
track.
Keywords: Information Retrieval evaluation, relevance assessments, pooling, multi-armed bandits