Several studies in the literature have shown that the words
people use are indicative of their psychological states. In particular, depression
was found to be associated with distinctive linguistic patterns.
However, there is a lack of publicly available data for doing research
on the interaction between language and depression. In this paper, we
describe our first steps to fill this gap. We outline the methodology we
have adopted to build and make publicly available a test collection on
depression and language use. The resulting corpus includes a series of
textual interactions written by different subjects. The new collection not
only encourages research on differences in language between depressed
and non-depressed individuals, but also on the evolution of the language
use of depressed individuals. Further, we propose a novel early detection
task and define a novel effectiveness measure to systematically compare
early detection algorithms. This new measure takes into account both the
accuracy of the decisions taken by the algorithm and the delay in detecting
positive cases. We also present baseline results with novel detection
methods that process users’ interactions in different ways.
Keywords: early risk, depression, evaluation, test collection