Health Information on the Internet: Relevant Search Results about Radon
The World Health Organization (WHO) has classified radon as a type 1 carcinogen and scientific evidence proves that indoor radon exposure is the leading cause of lung cancer in non-smokers and the second leading cause in smokers. Several studies rank the Internet as one of the most important sources of health-related information and highlight its ability to motivate behavioral changes to reduce the potential health effects associated with a risk. The objective of this research is to analyze the relevance of radon-related results when users submit certain information needs to search engines. This study is conducted by a multidisciplinary team (journalism-communication-computer science). This allows to approach this research challenge from different perspectives and methodologies.
To carry out this analysis, we ran several radon-related searches against a large web corpus (C4). This is a colossal, cleaned version of Common Crawl's web crawl corpus. We indexed this web collection and, next, searched for webpages relevant to 51 radon-related information needs. Given the retrieved webpages, we employed Deep Linguistic technologies to extract the passages that are the most related to the information need. A set of relevance assessment guidelines were then defined and the passages were tagged (non-relevant, relevant and highly relevant) by three different assessors.
The obtained results highlight the difficulty of finding information on the Internet that is either relevant or highly relevant to users' information needs about radon gas. This is the first study of its kind on radon information on the Internet, allowing further in-depth research in this field.
keywords: Health Misinformation
Publication: Congress
1701164607834
November 28, 2023
/research/publications/health-information-on-the-internet-relevant-search-results-about-radon
The World Health Organization (WHO) has classified radon as a type 1 carcinogen and scientific evidence proves that indoor radon exposure is the leading cause of lung cancer in non-smokers and the second leading cause in smokers. Several studies rank the Internet as one of the most important sources of health-related information and highlight its ability to motivate behavioral changes to reduce the potential health effects associated with a risk. The objective of this research is to analyze the relevance of radon-related results when users submit certain information needs to search engines. This study is conducted by a multidisciplinary team (journalism-communication-computer science). This allows to approach this research challenge from different perspectives and methodologies.
To carry out this analysis, we ran several radon-related searches against a large web corpus (C4). This is a colossal, cleaned version of Common Crawl's web crawl corpus. We indexed this web collection and, next, searched for webpages relevant to 51 radon-related information needs. Given the retrieved webpages, we employed Deep Linguistic technologies to extract the passages that are the most related to the information need. A set of relevance assessment guidelines were then defined and the passages were tagged (non-relevant, relevant and highly relevant) by three different assessors.
The obtained results highlight the difficulty of finding information on the Internet that is either relevant or highly relevant to users' information needs about radon gas. This is the first study of its kind on radon information on the Internet, allowing further in-depth research in this field. - Noel Pascual Presa, Lucía Ortigueira Piñeiro, Noemí Fernández Folgueiro, David E. Losada, Berta García-Orosa, Marcos Fernández-Pichel
publications_en