Combining Psycho-linguistic, Content-based and Chat-based Features to Detect Predation in Chatrooms

The Digital Age has brought great benefits for the human race but also some drawbacks. Nowadays, people from opposite corners of the World can communicate online via instant messaging services. Unfortunately, this has introduced new kinds of crime. Sexual predators have adapted their predatory strategies to these platforms and, usually, the target victims are kids. The authorities cannot manually track all threats because massive amounts of online conversations take place in a daily basis. Automatic methods for alerting about these crimes need to be designed. This is the main motivation of this paper, where we present a Machine Learning approach to identify suspicious subjects in chat-rooms. We propose novel types of features for representing the chatters and we evaluate different classifiers against the largest benchmark available. This empirical validation shows that our approach is promising for the identification of predatory behaviour. Furthermore, we carefully analyse the characteristics of the learnt classifiers. This preliminary analysis is a first step towards profiling the behaviour of the sexual predators when chatting on the Internet.

Palabras clave: Sexual predation, Cybercrime, Text Mining, Machine Learning, Support Vector Machines, Psycho-linguistic analysis