Exploring semantic knowledge in vector models: homonymy, polysemy, synonymy and idiomaticity

In this project we design methodologies for the systematic interpretation and analysis of semantic knowledge encoded in vector models in several languages. In particular, we focus on the evaluation of the meaning representation (i) of homonymous words, (ii) of polysemous words, (iii) of synonymous words, and (iv) of multiword expressions (MWEs) with different degrees of semantic compositionality (i.e., more or less idiomatic expressions).

Objectives

The aim of the project is to explore the semantic knowledge encoded by the most recent vector models, and to evaluate new methods to improve those aspects where these systems do not perform satisfactorily. We also aim to provide new results on the interpretation by human evaluators of the four semantic phenomena referred to in various controlled contexts. Among the alternatives to improve the modelling we will explore, among others, compositional learning strategies, the use of fine-tuning, or the injection of individual vectors for MWEs. Experiments and analyses will be conducted in Galician and Portuguese, Spanish and English.