A Targeted Assessment of the Syntactic Abilities of Transformer Models for Galician-Portuguese

This paper presents a targeted syntactic evaluation of Transformer models for Galician-Portuguese. We defined three experiments that allow to explore how these models, trained with a masked language modeling objective, encode syntactic knowledge. To do so, we created a new dataset including test instances of number (subject-verb), gender (subject-predicative adjective), and person (subject-inflected infinitive) agreement. This dataset was used to evaluate monolingual and multilingual BERT models, controlling for various aspects such as the presence of attractors or the distance between the dependent elements. The results show that Transformer models perform competently in many cases, but they are generally confounded by the presence of attractors in long-distance dependencies. Moreover, the different behavior of monolingual models trained with the same corpora reinforces the need for a deep exploration of the network architectures and their learning process.

keywords: language models, syntax, targeted syntactic evaluation