TweetNorm es Corpus: an Annotated Corpus for Spanish Microtext Normalization
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under
the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created
for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research
groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation
process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.
Palabras clave: Microtext normalization, Twitter, phonology
Publicación: Congreso
1624015034326
18 de xuño de 2021
/research/publications/tweetnorm-es-corpus-an-annotated-corpus-for-spanish-microtext-normalization
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under
the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created
for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research
groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation
process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used. - Alegria, Iñaki, Nora Aranberri, Pere Comas, Víctor Fresno, Pablo Gamallo, Lluis Padró, Iñaki San Vicente, Jordi, Turmo and Arkaitz Zubiaga
publications_gl