LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction

This paper presents LinguaKit, a multilingual suite of tools for analysis, extraction, annotation and linguistic cor- rection, as well as its integration into a Big Data infrastructure. LinguaKit allows the user to perform different tasks such as PoS-tagging, syntactic parsing, coreference resolution (among others), including applications for relation extraction, sentiment analysis, summarization, extraction of multiword expressions, or entity linking to DBpedia. Most modules work in four languages: Portuguese, Spanish, English, and Galician. The system is pro- grammed in Perl and is freely available under a GPLv3 license.

keywords: