English
Related papers

Related papers: Quantifying the Task-Specific Information in Text-…

200 papers

Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language…

Computation and Language · Computer Science 2023-12-14 Kamil Kanclerz , Julita Bielaniewicz , Marcin Gruza , Jan Kocon , Stanisław Woźniak , Przemysław Kazienko

Instruction tuning is now the default way to train and adapt large language models, but many instruction--input--output pairs are only weakly specified: for a given input, the same output can remain plausible under several alternative…

Computation and Language · Computer Science 2026-02-04 Pritam Kadasi , Abhishek Upperwal , Mayank Singh

Understanding the importance of the inputs on the output is useful across many tasks. This work provides an information-theoretic framework to analyse the influence of inputs for text classification tasks. Natural language processing (NLP)…

Computation and Language · Computer Science 2024-02-05 Luran Wang , Mark Gales , Vatsal Raina

Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena,…

Artificial Intelligence · Computer Science 2020-10-12 Pratik Joshi , Somak Aditya , Aalok Sathe , Monojit Choudhury

Language models (LMs), despite their advances, often depend on spurious correlations, undermining their accuracy and generalizability. This study addresses the overlooked impact of subtler, more complex shortcuts that compromise model…

Computation and Language · Computer Science 2024-11-13 Yuqing Zhou , Ruixiang Tang , Ziyu Yao , Ziwei Zhu

Skill Extraction (SE) is an important and widely-studied task useful to gain insights into labor market dynamics. However, there is a lacuna of datasets and annotation guidelines; available datasets are few and contain crowd-sourced labels…

Computation and Language · Computer Science 2022-04-28 Mike Zhang , Kristian Nørgaard Jensen , Sif Dam Sonniks , Barbara Plank

While real world challenges typically define visual categories with language words or phrases, most visual classification methods define categories with numerical indices. However, the language specification of the classes provides an…

Computer Vision and Pattern Recognition · Computer Science 2022-02-21 Suzanne Petryk , Lisa Dunlap , Keyan Nasseri , Joseph Gonzalez , Trevor Darrell , Anna Rohrbach

This paper addresses the challenge of classifying and assigning programming tasks to experts, a process that typically requires significant effort, time, and cost. To tackle this issue, a novel dataset containing a total of 4,112…

Computation and Language · Computer Science 2024-10-01 Areeg Fahad Rasheed , M. Zarkoosh , Safa F. Abbas , Sana Sabah Al-Azzawi

Human explanations of natural language, rationales, form a tool to assess whether models learn a label for the right reasons or rely on dataset-specific shortcuts. Sufficiency is a common metric for estimating the informativeness of…

Computation and Language · Computer Science 2025-11-21 Jonathan Kamp , Lisa Beinborn , Antske Fokkens

For high-resource languages like English, text classification is a well-studied task. The performance of modern NLP models easily achieves an accuracy of more than 90% in many standard datasets for text classification in English (Xie et…

Computation and Language · Computer Science 2022-06-06 Dawei Zhu , Michael A. Hedderich , Fangzhou Zhai , David Ifeoluwa Adelani , Dietrich Klakow

The ability to understand logical relationships between sentences is an important task in language understanding. To aid in progress for this task, researchers have collected datasets for machine learning and evaluation of current systems.…

Computation and Language · Computer Science 2019-06-25 Shawn Tan , Yikang Shen , Chin-wei Huang , Aaron Courville

Task-specific word identification aims to choose the task-related words that best describe a short text. Existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many…

Computation and Language · Computer Science 2017-06-06 Shuhan Yuan , Xintao Wu , Yang Xiang

Introduction: Clinical text classification using natural language processing (NLP) models requires adequate training data to achieve optimal performance. For that, 200-500 documents are typically annotated. The number is constrained by time…

Computation and Language · Computer Science 2026-01-23 Jaya Chaturvedi , Saniya Deshpande , Chenkai Ma , Robert Cobb , Angus Roberts , Robert Stewart , Daniel Stahl , Diana Shamsutdinova

Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of downstream…

Computation and Language · Computer Science 2023-06-12 Himanshu Gupta , Saurabh Arjun Sawant , Swaroop Mishra , Mutsumi Nakamura , Arindam Mitra , Santosh Mashetty , Chitta Baral

When solving NLP tasks with limited labelled data, researchers typically either use a general large language model without further update, or use a small number of labelled samples to tune a specialised smaller model. In this work, we…

Computation and Language · Computer Science 2026-01-26 Branislav Pecher , Ivan Srba , Maria Bielikova

We present MatSci-NLP, a natural language benchmark for evaluating the performance of natural language processing (NLP) models on materials science text. We construct the benchmark from publicly available materials science text data to…

Computation and Language · Computer Science 2023-05-16 Yu Song , Santiago Miret , Bang Liu

This work finds limited evidence supporting the theory that using multiple tasks with sequence-to-sequence transformer language models can improve performance on some metrics. In particular, the multi-task generalist t5-small outperforms…

Computation and Language · Computer Science 2024-01-17 Blake Vente

Small class-imbalanced datasets, common in many high-level semantic tasks like discourse analysis, present a particular challenge to current deep-learning architectures. In this work, we perform an extensive analysis on sentence-level…

Computation and Language · Computer Science 2021-01-05 Alexander Spangher , Jonathan May , Sz-rung Shiang , Lingjia Deng

The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model…

Computation and Language · Computer Science 2020-06-05 Duo Chai , Wei Wu , Qinghong Han , Fei Wu , Jiwei Li

Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance. To address this,…

Computation and Language · Computer Science 2024-10-07 Adrian Cosma , Stefan Ruseti , Mihai Dascalu , Cornelia Caragea
‹ Prev 1 2 3 10 Next ›