The international congress EPIA2024 awarded 'Carvalho_pt-gl', an innovative generative model of bilingual language for Galician and Portuguese designed at CiTIUS
A generative model capable of processing and generating content in Galician and Portuguese, developed by CiTIUS within the framework of the Nós Project, has just been awarded the 'Best Application Paper Award' at the international congress EPIA2024, marking a milestone for linguistic diversity in artificial intelligence.
The international congress on artificial intelligence EPIA2024, held between September 3 and 6 in the Portuguese town of Viana do Castelo, has just recognized with the Best Application Paper Award the article 'A Galician-Portuguese Generative Model'; a work led from the University of Santiago de Compostela by the researcher Pablo Gamallo within the framework of the Nós Project. financed by the Ministry of Culture, Language and Youth of the Xunta de Galicia and developed by CiTIUS and the Institute of the Galician Language of the USC (ILG). In it, the research team presents an innovative generative language model based on the Galician and Portuguese variants, which represents a significant advance in the integration of these languages in artificial intelligence models.
The model, known as Carvalho_pt-gl, is available for free download from the web, and has been specifically designed to process and generate content in Galician and European Portuguese, two closely related linguistic varieties, but little represented in current multilingual models. The research team, made up of experts from CiTIUS (University of Santiago de Compostela), the University of Évora, and the Universitat Pompeu Fabra, used a GPT architecture with 1,300 million parameters and more than 6,000 million words balanced between both languages. A challenge also framed within the ILENIA project (Promotion of Languages in Artificial Intelligence) within the PERTE 'New Economy of Language' financed by the Ministry for Digital Transformation and Public Service of the Government of Spain.
Pablo Gamallo explains that "the model was trained on the _Finisterrae III _ supercomputer of CESGA" -_Galician Supercomputing Center_-, "using a continuous pre-training strategy that has allowed a pre-existing multilingual model to be adapted, which helped a lot to overcome the data limitations that would have arisen if the training started from scratch". The head of Carvalho_pt-gl also highlights that "after evaluating the results obtained with standardized benchmarks" – a set of tests and references that are used to evaluate and compare the performance of language models – "we see that they show promising performance, while reinforcing the importance of promoting linguistic diversity in generative models."
The awarding of the Best Application Paper Award at a conference of the magnitude of the EPIA2024 underlines the impact and relevance of this work in the artificial intelligence landscape. The article highlights the need to develop inclusive and multicultural technologies that respect linguistic diversity, providing innovative solutions for minority or underrepresented languages such as Galician and Portuguese.
Along with Pablo Gamallo, Pablo Rodríguez, Susana Sotelo, Silvia Paniagua, Daniel Bardanca, José Ramom Pichel and Senén Barro (CiTIUS, Proyecto Nós) also participated in the Carvalho_pt-gl team, as well as Daniel Santos, Nuno Miquelina, Daniela Schmidt, Víctor Nogueira and Paulo Quaresma (University of Évora), and Iria de-Dios-Flores, (Universitat Pompeu Fabra).
About EPIA
The EPIA meeting (Portuguese Meeting of Artificial Intelligence) is an international scientific congress that is held annually, and focuses its efforts on the latest advances and applications of artificial intelligence. Organized by the Portuguese Association for Artificial Intelligence (APPIA), the event brings together researchers and experts from around the world to share knowledge, discuss innovative research, and promote collaborations in various areas of AI. The 2024 edition took place from 3 to 6 September, consolidating itself as one of the most important meetings in the field in southern Europe.