Congress 970
  • Marcos García, Pablo Gamallo
  • Third Symposium on Languages, Applications and Technologies. Madrid, España. 2015

Yet another suite of multilingual NLP tools

This paper presents the current development of a multilin-gual suite for Natural Language Processing. It consists of a sentence chunker, a tokenizer, a PoS-tagger, a dictionary-based lemmatizer and a Named Entity Recognizer (both for enamex and numex expressions). The architecture of the pipeline and the main resources used for its development are described. Besides, the PoS-tagger and Named Entity Recognizer are evaluated against several state-of-the-art systems. The experiments performed in Portuguese and English show that, in spite of its simplicity, our system competes with some well known tools for NLP. It is entirely written in Perl and distributed under a GPL license.
Keywords: natural language processing, PoS-tagging, named entity recog- nition, portuguese, english
Canonical link