DOMINO: Neural Machine Translation, In DOMain, NO supervised

Machine translation (MT) has been one of the most prominent applications of artificial intelligence since the very beginning of the field. In addition to its intrinsic interest given the difficulty and completeness of the problem, machine translation has a huge practical interest in our increasingly global word, as it promises to break the language barrier while safeguarding the cultural heritage and diversity of all the languages spoken in the world.

Although in 2018 quality machine translation remains a challenge for most language pairs, the development of this field in recent years has been impressive. The combination of the neural machine translation (NMT) paradigm of Deep Learning (including embeddings) and neural techniques (integrating translation and language models) has achieved results that seemed unthinkable three to four years ago. The appearance of the European DeepL system has marked a milestone in the development of this technology, as it has improved the state of the art in the field, competing with the Internet giants (mainly Google).

On the other hand, companies and private users have become familiar with the advantages and limitations of using this technology. While companies focus on increasing productivity by combining translation memories, MT tools and post-editing environments, private users make use of it intensively despite the fact that, in many cases, and especially for languages with limited resources, the quality they offer is not comparable to professional translation. The demand for MT, from both professionals and the society at large (included in the digital agenda), is increasing.

This project, coordinated by the research group IXA of the UPV/EHU and with the participation of the Fundación Elhuyar and the CiTIUS, is aim to improve the state of the art of the systems of TA of Deep and Neural Learning.

Objectives

More specifically, the objectives of the project are the following:

  • Improvement of the quality of NMT translation and obtaining reliable evaluations. Currently NMT system display several shortcomings, especially with regard to the fidelity of the generated text, which must be studied and solved: untranslated segments, problems related to the use of terminology, named entities, quantities and adjectives. It is also important to improve the learning and execution times of these systems, and to test new neural architectures.
  • New contributions to unsupervised automatic translation (especially useful for languages with few resources). Among the results of the TADEEP project, we can to underline the high impact this line of research has obtained, with publications in the most important forums in the area (ACL, EMNLP, AAAI, ICLR). Research in this line is one of the key objectives of this project, which will lead to high impact publications.
  • MT adaptation to specific domains and transfer to the business environment, as well as the application of the NMT paradigm to other seq2seq problems (grammatical correction, normalization of historical or informal texts...). This is the most applied part of the project, which tries to solve real needs of nearby businesses and social contexts.