Publications

Pablo Gamallo Otero

2024
Open Generative Large Language Models for Galician
CorpusNÓS: A massive Galician corpus for training large language models
An Unsupervised Perplexity-Based Method for Boilerplate Removal
2022
La Llei Paz Andrade i l’ús del portuguès per a la traducció automàtica al gallec
Evaluating Contextualized Vectors from both Large Language Models and Compositional Strategies”
Recent Advances in Digital Humanities: Romance Language Applications.
An exploration of the semantic knowledge in vector models: polysemy, synonymy and idiomaticity
Proxecto Nós: Artificial Intelligence at the Service of the Galician Language
A neural machine translation system for Galician from transliterated Portuguese text
The Nós Project: Opening routes for the Galician language in the field of language technologies
SemantiGal: An online visualizer of vector representations for Galician
2021
Uso de tecnologias linguı́sticas para estudar a evolução dos sufixos -ÇOM e -VEL no galego-português medieval a partir de corpora históricos
CiTIUS at the TREC 2021 Health Misinformation Track
Una metodología semiautomática de anotación de entidades nombradas para la creación de un gold standard
Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots
CiTIUS at FakeDeS 2021: A Hybrid Strategy for Fake News Detection
Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences
Using Dependency-Based Contextualization for transferring Passive Constructions from English to Spanish
Comparing Dependency-based Compositional Models with Contextualized Word Embedding
2020
CitiusNLP at SemEval-2020 Task 3: Comparing Two Approaches for Word Vector Contextualization
The Impact of Linguistic Knowledge in Different Strategies to Learn Cross-Lingual Distributional Models
Distância diacrónica automática entre variantes diatópicas do português e do espanhol
Measuring diachronic language distance using perplexity: Application to English, Portuguese, and Spanish
A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity
2019
Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora
Cross-lingual Diachronic Distance: Application to Portuguese and Spanish
NER and Open Information Extraction for Portuguese: Notebook for IberLEF 2019 Portuguese Named Entity Recognition and Relation Extraction Tasks
Naive-Bayesian Classification for Bot Detection in Twitter Notebook for PAN at CLEF 2019
Unsupervised Compositional Translation of Multiword Expressions
Uma utilidade para o reconhecimento de topónimos em documentos medievais
A dependency-based approach to word contextualization using compositional distributional semantics
Identifying Causal Relations in Legal Documents with Dependency Syntactic Analysis
CiTIUS-COLE at SemEval -2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets
Strategies for building high quality bilingual lexicons from comparable corpora
Explorando métodos non-supervisados para calcular a similitude semántica textual
Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions
2018
Using the Outlier Detection Task to Evaluate Distributional Semantic Models
Linguistic Features to Identify Extreme Opinions: An Empirical Study
GeoHbbTV: A framework for the development and evaluation of geographic interactive TV contents
Dependency parsing with finite state transducers and compression rules
LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction
A Comparative Study of Polarity Lexicons to Identify Extreme Opinions
Exploring Unsupervised Methods to Textual Similarity
Computational Processing of the Portuguese Language
Task-Oriented Evaluation of Dependency Parsing with Open Information Extraction
Measuring language distance among historical varieties using perplexity. Application to European Portuguese
Estratégias Lexicométricas para Detetar Especificidades Textuais
Evaluation of Distributional Models with the Outlier Detection Task
CitiusNLP at SemEval-2018 Task 10: The Use of Transparent Distributional Models and Salient Contexts to Discriminate Word Attributes
A lexicon based method to search for extreme opinions
Distributional Semantics for Diachronic Search
Evaluating and improving lexical resources for detecting signs of depression in text
2017
Searching for the Most Negative Opinions
From language identification to language distance
The role of syntactic dependencies in compositional distributional semantics
Comparing explicit and predictive distributional semantic models endowed with syntactic contexts
Sense Contextualization in a Dependency-Based Compositional Distributional Model
A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies
Citius at SemEval-2017 Task 2: Cross-Lingual Similarity from Comparable Corpora and Dependency-Based Contexts
Automatic Construction of Domain-Specific Sentiment Lexicons for Polarity Classification
Compositional Semantics using Feature-Based Models from WordNet
A Perplexity-Based Method for Similar Languages Discrimination
A Web Interface for Diachronic Semantic Search in Spanish
2015
Overview of TweetMT: A Shared Task on Machine Translation of Tweets at SEPLN 2015
Multilingual Open Information Extraction
TweetNorm: a benchmark for lexical normalization of Spanish tweets
Exploring the Effectiveness of Linguistic Knowledge for Biographical Relation Extraction
Dependency Parsing with Compression Rules
Yet another suite of multilingual NLP tools
Avalingua: Natural Language Processing for Automatic Error Detection
2014
Perldoop: Efficient Execution of Perl Scripts on Hadoop Clusters
Comparing Ranking-based and Naive Bayes Approaches to Language Detection on Tweets
Overview of TweetLID: Tweet Language Identification at SEPLN 2014
Uso de corpora comparáveis para filtrar dicionários bilíngues gerados por transitividade
Entity-Centric Coreference Resolution of Person Entities for Open Information Extraction
Análisis morfosintáctico y clasificación de entidades nombradas en un entorno Big Data
PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems
An Entity-Centric Coreference Resolution System for Person Entities with Rich Linguistic Information
Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets
An Overview of Open Information Extraction
Multilingual corpora with coreferential annotation of person entities
TweetNorm es Corpus: an Annotated Corpus for Spanish Microtext Normalization
2013
A Method to Lexical Normalisation of Tweets
Introducción a la tarea compartida Tweet-Norm 2013 : Normalización léxica de tuits en español
TASS: A Naive-Bayes strategy for sentiment analysis on Spanish tweets
Analyzing the Sense Distribution of Concordances Obtained by Web As Corpus Approach
Lexical Inheritance with Meronymic Relationships
Learning verb inflection using Cilenis conjugators
2012
Técnicas de procesamiento del lenguaje natural en la Recuperación de Información
Dependency-Based Open Information Extraction
DepPattern: A Multilingual Dependency Parser
Extraction of Bilingual Cognates from Wikipedia
Propuesta para una semántica de las dependencias sintácticas
2011
A Weakly-Supervised Rule-Based Approach for Relation Extraction
Evaluating Various Linguistic Features on Semantic Relation Extraction
Dependency-Based Text Compression for Semantic Relation Extraction
Measuring Comparability of Multilingual Corpora Extracted from Wikipedia
Resolución de Correferencia de Nombres de Persona para Extracción de Información Biográfica
An Exploration of the Linguistic Knowledge for Semantic Relation Extraction in Spanish
Is Singular Value Decomposition Useful for Word Similarity Extraction?
A Grammatical Formalism Based on Patterns of Part-of-Speech Tags