Distributional Semantics for Diachronic Search

This article describes a system aimed at searching for word similarity over different time periods. The strategy is based on distributional models obtained from a chronologically structured language resource, namely Google Books Syntactic Ngrams. The models were created using dependency-based contexts and a strategy for reducing the vector space, which consists of selecting the more informative and relevant word contexts. A quantitative evaluation of the distributional models was performed. The linguistic data are stored in a NoSQL DB, which is provided with a Web interface allowing linguists to analyze the meaning change of Spanish words in written texts across time.

keywords: Natural language processing, Diachronic semantics, Distributional semantics, Language change