Galician is at the forefront of smart technologies through the Nós-ILENIA Project

The results of the ILENIA project to develop language technologies for the languages of Spain, in which the Nós project participated very actively, represented a very significant step forward in positioning Galician within the digital economy and society.

  • Through Nós, Galician has become today the tenth language in the world with the highest number of hours collected on the public Common Voice platform, multiplying its presence by 20, going from 18 to 367 hours of available voice recordings.
  • Among the resources presented at the end of the ILENIA-Nós project, which are already available to the public and to the country’s companies and public administrations, are the first neural translator in Galician, and systems that convert text into natural speech and vice versa.
  • The Nós project will now continue its participation in ALIA, the public AI infrastructure that promotes the development of language technologies for Spanish and the other official languages of the State.

The Centro de Estudos Avanzados (CEA) of Santiago de Compostela hosted this morning the event Technology in Galician: results of ILENIA – Nós Project. This meeting served to showcase the results of the last three years of work of the Nós Project within the framework of ILENIA, an initiative of the Government of Spain aimed at advancing resources and capabilities in language technologies, particularly for the languages of the State. Data, use cases and open tools for machine translation, speech synthesis and language models in Galician were presented this Thursday the 18th in Santiago.

The day began with the institutional opening by Aleida Alcaide, Director General of Artificial Intelligence of the Government of Spain, who highlighted the fact that the project began to take shape “before the irruption of generative AI”, which shows that in research centers “they were already thinking about the potential that future artificial intelligence could have”. Alcaide explained that the Ministry’s objective was “to create a synergy” between similar initiatives in different co‑official languages and stressed that, along with data and infrastructure, “the third essential ingredient is talent, of which you have so much in Galicia”. Her speech also marked the beginning of a new stage, announcing that “ILENIA is closing but the ALIA stage is opening”, guaranteeing the Galician nodes “all our support in this process”.

For her part, the director of the ILG (Instituto da Lingua Galega), Elisa Fernández Rei, recalled that ILENIA was created to “promote in Spain a new digital economy based on natural language”, while the director of CiTIUS, Senén Barro, defended the central role of language technologies in artificial intelligence, warning that “what will be extraordinarily expensive is not to do it” and advocating public‑private collaboration as key to generating wealth and employment around this strategic field.

One of the central sections of the event was dedicated to the presentation of data and use cases, with the participation of representatives of the Nós Project itself and collaborating entities such as the Mozilla Foundation, Imaxin Software, AMTEGA and CREAGAL. In this section, it is worth highlighting the work carried out in the compilation of linguistic data, since, for example, in the last three years the hours of speech for automatic speech recognition went from 10 to 3,227. Common Voice currently lists 385 hours of collaborative voice bank, when initially there were only 18, multiplying its presence by 20 in this period and thus making Galician the tenth language among the 163 present on this public voice‑data platform. Common Voice is an open platform designed for the community creation of speech and text datasets in which anyone can preserve, revitalize and promote their language by sharing, creating and organizing text and voice datasets.

Senén Barro defended the central role of language technologies and advocated public‑private collaboration as key to generating wealth and employment around this strategic field

Moreover, this collection of audio fragments is being carried out with the collaboration of the Xunta de Galicia, also seeking representation of Galician dialectal varieties in order to generate a public corpus that is independent of technological changes and freely usable. The meeting also presented the work carried out in RAG (the English acronym for Retrieval Augmented Generation), one of the techniques used to control hallucinations in artificial intelligence models, generally improving the quality of user interaction with large language models. This space also hosted the presentation of the first neural translator in Galician, which also offers better objective performance than other existing ones and has several additional advantages: it is a public resource, it allows the translation of both plain text and files and, in addition, it is integrated into the Automatic Translation Platform of the State Secretariat for Digital Administration (PLaTa).

Other use cases presented throughout the morning were text‑to‑speech conversion systems (TTS, from English Text‑to‑Speech), specifically examples of synthetic voices in Galician and the AhoMyTTS initiative, a speech synthesis tool that converts written text into natural speech. AhoMyTTS works on the basis of artificial intelligence models developed by the University of the Basque Country and is adapted for different languages, including Galician, through the collaboration of the Nós project.

These demonstrators —the translator and the TTS— are intended to facilitate the transfer of the knowledge generated by the Nós Project to companies, public administrations and society in general, in order to promote the real use of the Galician language in advanced digital environments.

A sustainable and competitive technological ecosystem in Galician

The session Access and use of resources: data, models and tools delved into the open philosophy of the project, since both the data and the language models and tools developed are available for free use, in order to boost a sustainable and competitive technological ecosystem in Galician. Another of the points addressed was the imminent integration of the Nós Project into ALIA, a pioneering initiative in the European Union that seeks to create a public infrastructure of AI resources to promote Spanish and the co‑official languages in the development and implementation of artificial intelligence worldwide.

The day also included a presentation of the two AI factories in Spain. Specifically, the 1HealthAI factory was presented by Lois Orosa, director of CESGA, the center that will host the core infrastructure and operations of this factory.

The event also featured a round table on the importance of Galician in the digital world, with very prominent participation from Galician companies committed to the technological development of Galician and of language technologies in general. This round table highlighted the necessary collaboration between research, business and administration to ensure the presence of the Galician language in new technological developments and to provide a Galician industry in AI and language technologies.

The meeting closed with contributions from Valentín García, Secretary General for Language Policy of the Xunta de Galicia; Pedro Blanco Lobeiras, Government Delegate in Galicia; and Pilar Bermejo, Vice‑Rector for Scientific Policy of the University of Santiago de Compostela. All of them agreed in their speeches in identifying the Nós Project as a key piece in guaranteeing the future of Galician in the age of artificial intelligence.

Nós Project

The Nós Project is an initiative to place Galician alongside the most developed languages in the field of language technology and Artificial Intelligence. Its main objective is to generate the resources needed to facilitate the development of services and products based on language technology, such as voice assistants, machine translators or conversational agents.

In parallel, the project also promotes the digital presence of Galician, through the creation of a wide variety of high‑quality, freely usable tools and resources. Some of them (a multilingual neural translator, a speech recognizer that converts speech into written text, and a speech synthesis application that reads in Galician) are fully freely accessible through the project website, and available to any person, institution, organization or company that wants to develop a technological product, application or service incorporating the Galician language. In this way, in addition to guaranteeing the linguistic rights of the Galician‑speaking community in the digital world, it also contributes to the modernization and digitalization of the Galician business ecosystem and to the creation of value with new products that use Galician.

The Nós Project is an initiative of the Xunta de Galicia, which entrusted its implementation to the University of Santiago de Compostela (USC) through two leading research entities in language technologies and Artificial Intelligence: the Instituto da Lingua Galega (ILG) and the Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS). In this three‑year period it has been funded by the Ministerio para la Transformación Digital y de la Función Pública with funds from the European Union‑NextGenerationEU, within the framework of the ILENIA project.