Evaluating Galician language models for sentiment analysis on challenging linguistic phenomena Evaluación de modelos del lenguaje gallegos para el análisis del sentimiento tomando en cuenta fenómenos lingüísticos problemáticos

Sentiment analysis is still one of the most relevant tasks in NLP. However, low-resource languages lack sufficient datasets and models for this task. In this paper, we present a study on sentiment analysis in Galician, analyzing how linguistic phenomena can influence this task. For this purpose, we developed Senti-Gal, a dataset with 998 sentences including adversative, concessive and conditional sentences, diglossic phenomena, negation and irony. We evaluated Senti-Gal on seven models: a multilingual machine learning model, a multilingual decoder-only (or generative) model, and five encoder-only models (three multilingual and two monolingual), all of them fine-tuned with a training dataset we also developed. The results indicate that the best fine-tuned encoder-only models outperform the decoder-only model, that syntactic and pragmatic phenomena remain a challenge, and that monolingual and multilingual models perform similarly. We release Senti-Gal, the fine-tuned models and the first Galician training corpus for sentiment analysis freely available.

keywords: evaluation, fine-tuning, galician, sentiment analysis