Machine Translation for Low-Resource LanguagesPerformance Trade-offs Between Seq2Seq and Generative Approaches

This study evaluates two machine translation paradigms—sequence-tosequence (seq2seq) models and generative language models (LLMs)—for translating Spanish-Galician (closely related) and English-Galician (distant) language pairs. The seq2seq models include bilingual and multilingual models trained from scratch, and NLLB-200 as-is and fine-tuned. The generative models involve both pre-trained and fine-tuned large language models. The evaluation is conducted using quantitative metrics (BLEU and COMET) and qualitative analysis, which includes an ad hoc test suite designed to assess linguistic accuracy. Results show that fine-tuned generative models outperform seq2seq models for the distant language pair (English-Galician), whereas bilingual seq2seq models remain competitive for closely related languages (Spanish-Galician). The study highlights the trade-offs between both approaches and provides insights into optimizing translation strategies for low-resource languages like Galician.

keywords: Machine Translation, Low-resource languages, Natural Language Processing