Semi-Supervised Learning in the Field of Conversational Agents and Motivational Interviewing
The exploitation of Motivational Interviewing concepts for text analysis contributes to gaining valuable insights into individuals’ perspectives and attitudes towards behaviour change. The scarcity of labelled user data poses a persistent challenge and impedes technical advances in research under non-English language scenarios. To address the limitations of manual data labelling, we propose a semi-supervised learning method as a means to augment an existing training corpus. Our approach leverages machine-translated user-generated data sourced from social media communities and employs self-training techniques for annotation. To that end, we consider various source contexts and conduct an evaluation of multiple classifiers trained on various augmented datasets. The results indicate that this weak labelling approach does not yield improvements in the overall classification capabilities of the models. However, notable enhancements were observed for the minority classes. We conclude that several factors, including the quality of machine translation, can potentially bias the pseudo-labelling models and that the imbalanced nature of the data and the impact of a strict pre-filtering threshold need to be taken into account as inhibiting factors.
keywords: Semi-supervised learning, Motivational Interviewing, Conversational Agents
Publication: Article
1727864341074
October 2, 2024
/research/publications/semi-supervised-learning-in-the-field-of-conversational-agents-and-motivational-interviewing
The exploitation of Motivational Interviewing concepts for text analysis contributes to gaining valuable insights into individuals’ perspectives and attitudes towards behaviour change. The scarcity of labelled user data poses a persistent challenge and impedes technical advances in research under non-English language scenarios. To address the limitations of manual data labelling, we propose a semi-supervised learning method as a means to augment an existing training corpus. Our approach leverages machine-translated user-generated data sourced from social media communities and employs self-training techniques for annotation. To that end, we consider various source contexts and conduct an evaluation of multiple classifiers trained on various augmented datasets. The results indicate that this weak labelling approach does not yield improvements in the overall classification capabilities of the models. However, notable enhancements were observed for the minority classes. We conclude that several factors, including the quality of machine translation, can potentially bias the pseudo-labelling models and that the imbalanced nature of the data and the impact of a strict pre-filtering threshold need to be taken into account as inhibiting factors. - Gergana Rosenova, Marcos Fernández-Pichel, Selina Meyer, David E. Losada
publications_en