TweetNorm es Corpus: an Annotated Corpus for Spanish Microtext Normalization
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under
the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created
for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research
groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation
process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.
keywords: Microtext normalization, Twitter, phonology
Publication: Congress
1624015034326
June 18, 2021
/research/publications/tweetnorm-es-corpus-an-annotated-corpus-for-spanish-microtext-normalization
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under
the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created
for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research
groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation
process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used. - Alegria, Iñaki, Nora Aranberri, Pere Comas, Víctor Fresno, Pablo Gamallo, Lluis Padró, Iñaki San Vicente, Jordi, Turmo and Arkaitz Zubiaga
publications_en