Related papers: DISCERN: Decoding Systematic Errors in Natural Lan…

DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models

Understanding and explaining the behavior of machine learning models is essential for building transparent and trustworthy AI systems. We introduce DEXTER, a data-free framework that employs diffusion models and large language models to…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Simone Carnemolla , Matteo Pennisi , Sarinda Samarasinghe , Giovanni Bellitto , Simone Palazzo , Daniela Giordano , Mubarak Shah , Concetto Spampinato

FIND: Human-in-the-Loop Debugging Deep Text Classifiers

Since obtaining a perfect training dataset (i.e., a dataset which is considerably large, unbiased, and well-representative of unseen cases) is hardly possible, many real-world text classifiers are trained on the available, yet imperfect,…

Computation and Language · Computer Science 2020-10-13 Piyawat Lertvittayakumjorn , Lucia Specia , Francesca Toni

Investigating the Working of Text Classifiers

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

Classifying text using machine learning models and determining conversation drift

Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their…

Machine Learning · Computer Science 2022-11-16 Chaitanya Chadha , Vandit Gupta , Deepak Gupta , Ashish Khanna

The Authors Matter: Understanding and Mitigating Implicit Bias in Deep Text Classification

It is evident that deep text classification models trained on human data could be biased. In particular, they produce biased outcomes for texts that explicitly include identity terms of certain demographic groups. We refer to this type of…

Computation and Language · Computer Science 2021-05-07 Haochen Liu , Wei Jin , Hamid Karimi , Zitao Liu , Jiliang Tang

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more…

Computation and Language · Computer Science 2020-08-25 Weihao Yu , Zihang Jiang , Yanfei Dong , Jiashi Feng

Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding

Written language carries explicit and implicit biases that can distract from meaningful signals. For example, letters of reference may describe male and female candidates differently, or their writing style may indirectly reveal demographic…

Computation and Language · Computer Science 2021-09-27 Zexue He , Bodhisattwa Prasad Majumder , Julian McAuley

GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language

Helping end users comprehend the abstract distribution shifts can greatly facilitate AI deployment. Motivated by this, we propose a novel task, dataset explanation. Given two image data sets, dataset explanation aims to automatically point…

Computation and Language · Computer Science 2022-07-01 Zhiying Zhu , Weixin Liang , James Zou

FairFlow: Mitigating Dataset Biases through Undecided Learning

Language models are prone to dataset biases, known as shortcuts and spurious correlations in data, which often result in performance drop on new data. We present a new debiasing framework called ``FairFlow'' that mitigates dataset biases by…

Machine Learning · Computer Science 2025-03-25 Jiali Cheng , Hadi Amiri

BoostClean: Automated Error Detection and Repair for Machine Learning

Predictive models based on machine learning can be highly sensitive to data error. Training data are often combined with a variety of different sources, each susceptible to different types of inconsistencies, and new data streams during…

Databases · Computer Science 2017-11-07 Sanjay Krishnan , Michael J. Franklin , Ken Goldberg , Eugene Wu

SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers

We introduce SelfExplain, a novel self-explaining model that explains a text classifier's predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies…

Computation and Language · Computer Science 2021-09-09 Dheeraj Rajagopal , Vidhisha Balachandran , Eduard Hovy , Yulia Tsvetkov

LOGAN: Local Group Bias Detection by Clustering

Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been…

Computation and Language · Computer Science 2020-10-07 Jieyu Zhao , Kai-Wei Chang

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear…

Computation and Language · Computer Science 2022-04-19 Sreyan Ghosh , Sonal Kumar , Yaman Kumar Singla , Rajiv Ratn Shah , S. Umesh

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the…

Computation and Language · Computer Science 2026-01-15 Giorgio Franceschelli , Mirco Musolesi

Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition. When using appropriate modeling units, e.g., byte-pair encoding, these systems are in principle open vocabulary systems. In practice,…

Computation and Language · Computer Science 2026-03-05 Christian Huber , Alexander Waibel

Deep Learning for Bias Detection: From Inception to Deployment

To create a more inclusive workplace, enterprises are actively investing in identifying and eliminating unconscious bias (e.g., gender, race, age, disability, elitism and religion) across their various functions. We propose a deep learning…

Computation and Language · Computer Science 2021-11-01 Md Abul Bashar , Richi Nayak , Anjor Kothare , Vishal Sharma , Kesavan Kandadai

Search-based Structured Prediction

We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that…

Machine Learning · Computer Science 2009-07-07 Hal Daumé , John Langford , Daniel Marcu

Towards the Unseen: Iterative Text Recognition by Distilling from Errors

Visual text recognition is undoubtedly one of the most extensively researched topics in computer vision. Great progress have been made to date, with the latest models starting to focus on the more practical "in-the-wild" setting. However, a…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Ayan Kumar Bhunia , Pinaki Nath Chowdhury , Aneeshan Sain , Yi-Zhe Song

Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning

This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree,…

cmp-lg · Computer Science 2008-02-03 Raymond J. Mooney

Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis

Deceptive text classification is a critical task in natural language processing that aims to identify deceptive o fraudulent content. This study presents a comparative analysis of machine learning and transformer-based approaches for…

Computation and Language · Computer Science 2023-08-14 Anusuya Krishnan