Related papers: Text Characterization Toolkit

Towards Human-Centred Explainability Benchmarks For Text Classification

Progress on many Natural Language Processing (NLP) tasks, such as text classification, is driven by objective, reproducible and scalable evaluation via publicly available benchmarks. However, these are not always representative of…

Computation and Language · Computer Science 2022-11-11 Viktor Schlegel , Erick Mendez-Guzman , Riza Batista-Navarro

autoNLP: NLP Feature Recommendations for Text Analytics Applications

While designing machine learning based text analytics applications, often, NLP data scientists manually determine which NLP features to use based upon their knowledge and experience with related problems. This results in increased efforts…

Computation and Language · Computer Science 2020-02-11 Janardan Misra

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Several benchmarks have been built with heavy investment in resources to track our progress in NLP. Thousands of papers published in response to those benchmarks have competed to top leaderboards, with models often surpassing human…

Computation and Language · Computer Science 2022-10-17 Swaroop Mishra , Anjana Arunkumar , Chris Bryan , Chitta Baral

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

TextAttack: Lessons learned in designing Python frameworks for NLP

TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components…

Software Engineering · Computer Science 2020-10-06 John X. Morris , Jin Yong Yoo , Yanjun Qi

Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases

NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where these biases do not hold. Recent work sought to develop robust, unbiased models by filtering biased examples…

Computation and Language · Computer Science 2023-05-31 Yuval Reif , Roy Schwartz

An Interactive Tool for Natural Language Processing on Clinical Text

Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to end-users who are interested in analyzing clinical records. Although NLP has been widely used in extracting information from…

Human-Computer Interaction · Computer Science 2017-07-10 Gaurav Trivedi , Phuong Pham , Wendy Chapman , Rebecca Hwa , Janyce Wiebe , Harry Hochheiser

Text Quality-Based Pruning for Efficient Training of Language Models

In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious. In this paper we propose a novel method for numerically evaluating…

Computation and Language · Computer Science 2024-05-14 Vasu Sharma , Karthik Padthe , Newsha Ardalani , Kushal Tirumala , Russell Howes , Hu Xu , Po-Yao Huang , Shang-Wen Li , Armen Aghajanyan , Gargi Ghosh , Luke Zettlemoyer

Description Based Text Classification with Reinforcement Learning

The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model…

Computation and Language · Computer Science 2020-06-05 Duo Chai , Wei Wu , Qinghong Han , Fei Wu , Jiwei Li

Evaluating Transformer-Based Multilingual Text Classification

As NLP tools become ubiquitous in today's technological landscape, they are increasingly applied to languages with a variety of typological structures. However, NLP research does not focus primarily on typological differences in its…

Computation and Language · Computer Science 2020-05-04 Sophie Groenwold , Samhita Honnavalli , Lily Ou , Aesha Parekh , Sharon Levy , Diba Mirza , William Yang Wang

Multi-Dimensional Gender Bias Classification

Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decomposes gender bias in…

Computation and Language · Computer Science 2020-05-05 Emily Dinan , Angela Fan , Ledell Wu , Jason Weston , Douwe Kiela , Adina Williams

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on…

Computation and Language · Computer Science 2020-05-11 Marco Tulio Ribeiro , Tongshuang Wu , Carlos Guestrin , Sameer Singh

Towards a Diagnostic and Predictive Evaluation Methodology for Sequence Labeling Tasks

Standard evaluation in NLP typically indicates that system A is better on average than system B, but it provides little info on how to improve performance and, what is worse, it should not come as a surprise if B ends up being better than A…

Computation and Language · Computer Science 2026-03-17 Elena Alvarez-Mellado , Julio Gonzalo

Classifying text using machine learning models and determining conversation drift

Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their…

Machine Learning · Computer Science 2022-11-16 Chaitanya Chadha , Vandit Gupta , Deepak Gupta , Ashish Khanna

Adapting Sequence to Sequence models for Text Normalization in Social Media

Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot…

Computation and Language · Computer Science 2019-04-15 Ismini Lourentzou , Kabir Manghnani , ChengXiang Zhai

Noisy Text Data: Achilles' Heel of popular transformer based NLP models

In the last few years, the ML community has created a number of new NLP models based on transformer architecture. These models have shown great performance for various NLP tasks on benchmark datasets, often surpassing SOTA results. Buoyed…

Computation and Language · Computer Science 2021-10-08 Kartikay Bagla , Ankit Kumar , Shivam Gupta , Anuj Gupta

Quantifying Uncertainties in Natural Language Processing Tasks

Reliable uncertainty quantification is a first step towards building explainable, transparent, and accountable artificial intelligent systems. Recent progress in Bayesian deep learning has made such quantification realizable. In this paper,…

Computation and Language · Computer Science 2018-11-20 Yijun Xiao , William Yang Wang

The Text Classification Pipeline: Starting Shallow going Deeper

Text classification stands as a cornerstone within the realm of Natural Language Processing (NLP), particularly when viewed through computer science and engineering. The past decade has seen deep learning revolutionize text classification,…

Computation and Language · Computer Science 2025-04-23 Marco Siino , Ilenia Tinnirello , Marco La Cascia

Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

Few-shot learning benchmarks are critical for evaluating modern NLP techniques. It is possible, however, that benchmarks favor methods which easily make use of unlabeled text, because researchers can use unlabeled text from the test set to…

Computation and Language · Computer Science 2024-10-03 Kush Dubey

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant