Comparing Traditional and Neural Approaches for detecting Health-related Misinformation

Detecting health-related misinformation is a research challenge that has recently received increasing attention. Helping people to find credible and accurate health information on the Web remains an open research issue as has been highlighted during the COVID-19 pandemic. However, in such scenarios, it is often critical to detect misinformation quickly, which implies working with little data, at least at the beginning of the spread of such information. In this work, we present a comparison between different automatic approaches of identifying misinformation, and we compare how they behave for different tasks and with limited training data. We experiment with traditional algorithms, such as SVMs or KNNs, as well as newer BERT-based models. Our experiments utilise the CLEF 2018 Consumer Health Search task dataset to perform experiments on detecting untrustworthy contents and information that is difficult to read. Our results suggest that traditional models are still a strong baseline for these challenging tasks. In the absence of substantive training data, classical approaches tend to outperform BERT-based models.

keywords: Health-related content, Misinformation, Language, Neural approaches