Seeding Simulated Queries with User-study Data for Personal Search Evaluation

In this paper we perform a lab-based user study (n=21) of email re-finding behaviour, examining how the characteristics of submitted queries change in different situations. A number of logistic regression models are developed on the query data to explore the relationship between user (and contextual) variables and query characteristics including length, field submitted to and use of named entities. We reveal several interesting trends and use the findings to seeda simulated evaluation of various retrieval models. Not only is this an enhancement of existing evaluation methods for Personal Search, but the results show that different models are more effective in different situations, which has implications both for the design of email search tools and for the way algorithms for Personal Search are evaluated

keywords: