Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection
The quality of training data is crucial for the performance of supervised machine learning models. In particular, poor annotation quality and spurious correlations between labels and features in text dataset can significantly degrade model generalization. This problem is especially pronounced in harmful language detection, where prior studies have revealed major eficiencies in existing datasets. In this work, we design and test data selection methods based on learnability measures to improve dataset quality. Using a sexism dataset with counterfactuals designed to avoid spurious correlations, we show that pruning with EL2N and PVI scores can lead to significant performance increases and outperforms submodular and random selection. Our analysis reveals that in presence of label imbalance models rely on dataset shortcuts; especially easy-to-classify sexist instances and hard-to-classify non-sexist instances contain shortcuts. Pruning these instances leads to performance increases. Pruning hard-to-classify instances is in general a promising strategy as well when shortcuts are not present.
keywords:
Publication: Congress
1769003493299
January 21, 2026
/research/publications/enhancing-training-data-quality-through-influence-scores-for-generalizable-classification-a-case-study-on-sexism-detection
The quality of training data is crucial for the performance of supervised machine learning models. In particular, poor annotation quality and spurious correlations between labels and features in text dataset can significantly degrade model generalization. This problem is especially pronounced in harmful language detection, where prior studies have revealed major eficiencies in existing datasets. In this work, we design and test data selection methods based on learnability measures to improve dataset quality. Using a sexism dataset with counterfactuals designed to avoid spurious correlations, we show that pruning with EL2N and PVI scores can lead to significant performance increases and outperforms submodular and random selection. Our analysis reveals that in presence of label imbalance models rely on dataset shortcuts; especially easy-to-classify sexist instances and hard-to-classify non-sexist instances contain shortcuts. Pruning these instances leads to performance increases. Pruning hard-to-classify instances is in general a promising strategy as well when shortcuts are not present. - Rabiraj Bandyopadhyay, Dennis Assenmacher, Jose M. Alonso-Moral, Claudia Wagner
publications_en