In this article we propose to to define and develop an automatic strategy to search for lexical specificities within sets of texts using simple lexical units and multiword expressions (MWE).
We propose a methodology for calculating the divergence of lemma and MWE distributions that will automatically find differences and similarities between unlabeled texts. This methodology can be used to subsequently identify groups of texts to which quantitative and qualitative analyzes will be applied (semiautomatically and/or with human intervention).
In a first test, we used two specialized texts (from the area of Paediatrics) and a literary text, assuming that the texts of specialty should present greater divergences with respect to the literary text than among themselves. As the tests that were done showed the expected trend, we decided to apply the same methodology to a second set of texts (three sets of interviews done to visitors in the city of Santiago de Compostela).
Keywords: divergencia de Kullback-Leibler, divergˆencia lexical, lexicometria