C3HS: Content curation for consumer health search - Search and misinformation detection

One of the most time-critical challenges in the area of Information Access is to combat the spread of misinformation. Existing approaches for misinformation detection use elements such as neural network models, statistical methods, linguistic traits, or fact-checking strategies. However, the menace of false information seems to grow more vigorous with the advent of unusually creative language models.

Misinformation on the web and social media is a real problem nowadays and has a profound social, economic, and political impact resulting in unwanted consequences like election interference, polarization and violence. The problem appears more intense in this situation of a global health crisis, as misinformation on the COVID-19 pandemic could result in an unprecedented health disaster. We already saw that several myths surfacing on social media regarding medicines for COVID-19, the virality of the infection spread and inflammatory articles targeting marginal communities. The problem seems profound in developing countries because literacy levels are low, understanding and exposure to technology for fake news detection are limited, but increasing access to cheap internet makes the mass more susceptible to believing and acting upon misinformation.

Web search is widely used to find online advice and, more specifically, medical advice. This area of web search is nowadays commonly referred to as Consumer Health Search. Health-related information access requires retrieval algorithms capable of promoting reliable documents and filtering out unreliable ones. To that end, different types of ingredients, such as query-document matching features, passage relevance estimation, reliability estimators, and appropriate recommendation models need to be combined. In this project, we aim at building an entire pipeline for misinformation detection, based on the fusion of multiple features and complementary tools.

Objectives

We pursue an intelligent combination of advanced techniques from multiple fields (Information Retrieval, Text Classification, Recommendation, Natural Language Processing) able to design effective content curation strategies for consumer health search tasks. The project requires activity in Information Technology and Psychology, in various specific areas. The research team has experience in search, massive data processing and computational linguistics. The project's objectives reflect the challenges and opportunities of content curation to support health-related information needs. We will contribute to developing evaluation strategies for information access systems oriented to content curation, to search/filtering/topic analysis for consumer health search, to develop advanced misinformation detection and credibility analysis models, and to design solutions for massive processing of data.

One of the most time-critical challenges in the area of Information Access is to combat the spread of misinformation. Existing approaches for misinformation detection use elements such as neural network models, statistical methods, linguistic traits, or fact-checking strategies. However, the menace of false information seems to grow more vigorous with the advent of unusually creative language models.Misinformation on the web and social media is a real problem nowadays and has a profound social, economic, and political impact resulting in unwanted consequences like election interference, polarization and violence. The problem appears more intense in this situation of a global health crisis, as misinformation on the COVID-19 pandemic could result in an unprecedented health disaster. We already saw that several myths surfacing on social media regarding medicines for COVID-19, the virality of the infection spread and inflammatory articles targeting marginal communities. The problem seems profound in developing countries because literacy levels are low, understanding and exposure to technology for fake news detection are limited, but increasing access to cheap internet makes the mass more susceptible to believing and acting upon misinformation.Web search is widely used to find online advice and, more specifically, medical advice. This area of web search is nowadays commonly referred to as Consumer Health Search. Health-related information access requires retrieval algorithms capable of promoting reliable documents and filtering out unreliable ones. To that end, different types of ingredients, such as query-document matching features, passage relevance estimation, reliability estimators, and appropriate recommendation models need to be combined. In this project, we aim at building an entire pipeline for misinformation detection, based on the fusion of multiple features and complementary tools. We pursue an intelligent combination of advanced techniques from multiple fields (Information Retrieval, Text Classification, Recommendation, Natural Language Processing) able to design effective content curation strategies for consumer health search tasks. The project requires activity in Information Technology and Psychology, in various specific areas. The research team has experience in search, massive data processing and computational linguistics. The project's objectives reflect the challenges and opportunities of content curation to support health-related information needs. We will contribute to developing evaluation strategies for information access systems oriented to content curation, to search/filtering/topic analysis for consumer health search, to develop advanced misinformation detection and credibility analysis models, and to design solutions for massive processing of data. - PID2022-137061OB-C22 - David Enrique Losada Carril, Juan Carlos Pichel Campos - Manuel Couto Pintos, Marcos Fernández Pichel, Mario Ezra Aragón Saenzpardo, Tomás Fernández Pena, José Ramón Pichel Campos