Early risk prediction on the Internet: an evaluation corpus

Currently, citizens worldwide are exposed to a wide range of risks and threats and
many of these hazards are reflected on the Internet. Some of these threats stem from
criminals such as stalkers, mass killers or other offenders with sexual, racial, religious
or culturally related motivations. Other worrying threats might even come from the
individuals themselves. For instance, depression may lead to an eating disorder such
as anorexia or even to suicide. In some of these cases appropriate action or
intervention at an earlier stage could have reduced or minimised these problems.
The main purpose of this project is to begin the activities that will lead to
creating evaluation testbeds for early risk prediction. It seeks to foster research on new
types of technologies that are potentially applicable to a wide range of worrying social


1) To study the adequacy of different types of Internet repositories as data sources to create test collections for research on early risk prediction.
2) To identify prediction scenarios and use cases where constructing a testbed is feasible.
3) To design and implement crawling methods (and other types of computer programs for compiling data from the Internet) that gather Internet contents and create centralised repositories.
4) To define solid evaluation methodologies able to assess the relative effectiveness of different algorithms when doing predictions.