
Lecture: 'Authorship Identification with Rejection Mechanisms under Data-Scarce Conditions'
Attributing authorship to text is a complex problem for both specialists and AI systems. The difficulty stems from capturing subtle stylistic cues, dealing with texts from similar eras or languages, distinguishing heteronyms of a single writer, and even inferring the author’s gender. Traditional approaches require extracting numerous attributes—often via specialized linguistic tools—and rely on large training corpora, a dependence further amplified by modern transformer-based Deep Learning. Classic classifiers also tend to assign every document to a known class, even when it is unlike anything seen in training, though such out-of-scope texts should be reliably rejected.
We propose a language-independent method for authorship attribution that can reject anomalous samples in challenging settings, achieving high accuracy across all tested datasets. By evaluating each feature’s discriminative power, the final attribute set can be sharply reduced.
About the speaker
Joaquim Ferreira da Silva is Assistant Professor of Computer Science in the Department of Computer Science, Nova School of Science and technology, Universidade Nova de Lisboa. Joaquim Silva holds a PhD in Computer Science from the Universidade Nova de Lisboa (2004). He is an Integrated Member of Nova Lincs research center, his research interests are focused on Information Extraction, Text Mining and Machine Learning. He publishes regularly at scientific events on those areas such as ICDM, PAKDD, IEEE eScience, IEEE BigData, ECML PKDD, ICCS, PROPOR, EPIA, and has been involved in several projects, such as ISTRION, VIP-ACCESS, PATRAS and WE-LEARN. Since 2005, he has been co-chair of TeMA, a track of the EPIA conference. He has been a member of some conference Program Committees, such as EPIA, ICCS, ECML PKDD, PAKDD and MWE ACL workshop. He supervised some student theses, PhD and MsSc, and his teaching activity has been in courses such as Database, Text Mining, Computational Logic, Machine Learning, among others.
On-site event
/events/lecture-authorship-identification-with-rejection-mechanisms-under-data-scarce-conditions
events_en