Related papers: Quantifying the Task-Specific Information in Text-…

Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems

Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language…

Computation and Language · Computer Science 2023-12-14 Kamil Kanclerz , Julita Bielaniewicz , Marcin Gruza , Jan Kocon , Stanisław Woźniak , Przemysław Kazienko

Task--Specificity Score: Measuring How Much Instructions Really Matter for Supervision

Instruction tuning is now the default way to train and adapt large language models, but many instruction--input--output pairs are only weakly specified: for a given input, the same output can remain plausible under several alternative…

Computation and Language · Computer Science 2026-02-04 Pritam Kadasi , Abhishek Upperwal , Mayank Singh

An Information-Theoretic Approach to Analyze NLP Classification Tasks

Understanding the importance of the inputs on the output is useful across many tasks. This work provides an information-theoretic framework to analyse the influence of inputs for text classification tasks. Natural language processing (NLP)…

Computation and Language · Computer Science 2024-02-05 Luran Wang , Mark Gales , Vatsal Raina

TaxiNLI: Taking a Ride up the NLU Hill

Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena,…

Artificial Intelligence · Computer Science 2020-10-12 Pratik Joshi , Somak Aditya , Aalok Sathe , Monojit Choudhury

Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models

Language models (LMs), despite their advances, often depend on spurious correlations, undermining their accuracy and generalizability. This study addresses the overlooked impact of subtler, more complex shortcuts that compromise model…

Computation and Language · Computer Science 2024-11-13 Yuqing Zhou , Ruixiang Tang , Ziyu Yao , Ziwei Zhu

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

Skill Extraction (SE) is an important and widely-studied task useful to gain insights into labor market dynamics. However, there is a lacuna of datasets and annotation guidelines; available datasets are few and contain crowd-sourced labels…

Computation and Language · Computer Science 2022-04-28 Mike Zhang , Kristian Nørgaard Jensen , Sif Dam Sonniks , Barbara Plank

On Guiding Visual Attention with Language Specification

While real world challenges typically define visual categories with language words or phrases, most visual classification methods define categories with numerical indices. However, the language specification of the classes provides an…

Computer Vision and Pattern Recognition · Computer Science 2022-02-21 Suzanne Petryk , Lisa Dunlap , Keyan Nasseri , Joseph Gonzalez , Trevor Darrell , Anna Rohrbach

TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks

This paper addresses the challenge of classifying and assigning programming tasks to experts, a process that typically requires significant effort, time, and cost. To tackle this issue, a novel dataset containing a total of 4,112…

Computation and Language · Computer Science 2024-10-01 Areeg Fahad Rasheed , M. Zarkoosh , Safa F. Abbas , Sana Sabah Al-Azzawi

Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies

Human explanations of natural language, rationales, form a tool to assess whether models learn a label for the right reasons or rely on dataset-specific shortcuts. Sufficiency is a common metric for estimating the informativeness of…

Computation and Language · Computer Science 2025-11-21 Jonathan Kamp , Lisa Beinborn , Antske Fokkens

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-studied task. The performance of modern NLP models easily achieves an accuracy of more than 90% in many standard datasets for text classification in English (Xie et…

Computation and Language · Computer Science 2022-06-06 Dawei Zhu , Michael A. Hedderich , Fangzhou Zhai , David Ifeoluwa Adelani , Dietrich Klakow

Investigating Biases in Textual Entailment Datasets

The ability to understand logical relationships between sentences is an important task in language understanding. To aid in progress for this task, researchers have collected datasets for machine learning and evaluation of current systems.…

Computation and Language · Computer Science 2019-06-25 Shawn Tan , Yikang Shen , Chin-wei Huang , Aaron Courville

Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

Task-specific word identification aims to choose the task-related words that best describe a short text. Existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many…

Computation and Language · Computer Science 2017-06-06 Shuhan Yuan , Xintao Wu , Yang Xiang

Determinants of Training Corpus Size for Clinical Text Classification

Introduction: Clinical text classification using natural language processing (NLP) models requires adequate training data to achieve optimal performance. For that, 200-500 documents are typically annotated. The number is constrained by time…

Computation and Language · Computer Science 2026-01-23 Jaya Chaturvedi , Saniya Deshpande , Chenkai Ma , Robert Cobb , Angus Roberts , Robert Stewart , Daniel Stahl , Diana Shamsutdinova

Instruction Tuned Models are Quick Learners

Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of downstream…

Computation and Language · Computer Science 2023-06-12 Himanshu Gupta , Saurabh Arjun Sawant , Swaroop Mishra , Mutsumi Nakamura , Arindam Mitra , Santosh Mashetty , Chitta Baral

Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

When solving NLP tasks with limited labelled data, researchers typically either use a general large language model without further update, or use a small number of labelled samples to tune a specialised smaller model. In this work, we…

Computation and Language · Computer Science 2026-01-26 Branislav Pecher , Ivan Srba , Maria Bielikova

MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling

We present MatSci-NLP, a natural language benchmark for evaluating the performance of natural language processing (NLP) models on materials science text. We construct the benchmark from publicly available materials science text data to…

Computation and Language · Computer Science 2023-05-16 Yu Song , Santiago Miret , Bang Liu

Inroads to a Structured Data Natural Language Bijection and the role of LLM annotation

This work finds limited evidence supporting the theory that using multiple tasks with sequence-to-sequence transformer language models can improve performance on some metrics. In particular, the multi-task generalist t5-small outperforms…

Computation and Language · Computer Science 2024-01-17 Blake Vente

Multitask Learning for Class-Imbalanced Discourse Classification

Small class-imbalanced datasets, common in many high-level semantic tasks like discourse analysis, present a particular challenge to current deep-learning architectures. In this work, we perform an extensive analysis on sentence-level…

Computation and Language · Computer Science 2021-01-05 Alexander Spangher , Jonathan May , Sz-rung Shiang , Lingjia Deng

Description Based Text Classification with Reinforcement Learning

The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model…

Computation and Language · Computer Science 2020-06-05 Duo Chai , Wei Wu , Qinghong Han , Fei Wu , Jiwei Li

How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics

Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance. To address this,…

Computation and Language · Computer Science 2024-10-07 Adrian Cosma , Stefan Ruseti , Mihai Dascalu , Cornelia Caragea