English
Related papers

Related papers: Text Characterization Toolkit

200 papers

Progress on many Natural Language Processing (NLP) tasks, such as text classification, is driven by objective, reproducible and scalable evaluation via publicly available benchmarks. However, these are not always representative of…

Computation and Language · Computer Science 2022-11-11 Viktor Schlegel , Erick Mendez-Guzman , Riza Batista-Navarro

While designing machine learning based text analytics applications, often, NLP data scientists manually determine which NLP features to use based upon their knowledge and experience with related problems. This results in increased efforts…

Computation and Language · Computer Science 2020-02-11 Janardan Misra

Several benchmarks have been built with heavy investment in resources to track our progress in NLP. Thousands of papers published in response to those benchmarks have competed to top leaderboards, with models often surpassing human…

Computation and Language · Computer Science 2022-10-17 Swaroop Mishra , Anjana Arunkumar , Chris Bryan , Chitta Baral

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components…

Software Engineering · Computer Science 2020-10-06 John X. Morris , Jin Yong Yoo , Yanjun Qi

NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where these biases do not hold. Recent work sought to develop robust, unbiased models by filtering biased examples…

Computation and Language · Computer Science 2023-05-31 Yuval Reif , Roy Schwartz

Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to end-users who are interested in analyzing clinical records. Although NLP has been widely used in extracting information from…

Human-Computer Interaction · Computer Science 2017-07-10 Gaurav Trivedi , Phuong Pham , Wendy Chapman , Rebecca Hwa , Janyce Wiebe , Harry Hochheiser

In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious. In this paper we propose a novel method for numerically evaluating…

The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model…

Computation and Language · Computer Science 2020-06-05 Duo Chai , Wei Wu , Qinghong Han , Fei Wu , Jiwei Li

As NLP tools become ubiquitous in today's technological landscape, they are increasingly applied to languages with a variety of typological structures. However, NLP research does not focus primarily on typological differences in its…

Computation and Language · Computer Science 2020-05-04 Sophie Groenwold , Samhita Honnavalli , Lily Ou , Aesha Parekh , Sharon Levy , Diba Mirza , William Yang Wang

Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decomposes gender bias in…

Computation and Language · Computer Science 2020-05-05 Emily Dinan , Angela Fan , Ledell Wu , Jason Weston , Douwe Kiela , Adina Williams

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on…

Computation and Language · Computer Science 2020-05-11 Marco Tulio Ribeiro , Tongshuang Wu , Carlos Guestrin , Sameer Singh

Standard evaluation in NLP typically indicates that system A is better on average than system B, but it provides little info on how to improve performance and, what is worse, it should not come as a surprise if B ends up being better than A…

Computation and Language · Computer Science 2026-03-17 Elena Alvarez-Mellado , Julio Gonzalo

Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their…

Machine Learning · Computer Science 2022-11-16 Chaitanya Chadha , Vandit Gupta , Deepak Gupta , Ashish Khanna

Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot…

Computation and Language · Computer Science 2019-04-15 Ismini Lourentzou , Kabir Manghnani , ChengXiang Zhai

In the last few years, the ML community has created a number of new NLP models based on transformer architecture. These models have shown great performance for various NLP tasks on benchmark datasets, often surpassing SOTA results. Buoyed…

Computation and Language · Computer Science 2021-10-08 Kartikay Bagla , Ankit Kumar , Shivam Gupta , Anuj Gupta

Reliable uncertainty quantification is a first step towards building explainable, transparent, and accountable artificial intelligent systems. Recent progress in Bayesian deep learning has made such quantification realizable. In this paper,…

Computation and Language · Computer Science 2018-11-20 Yijun Xiao , William Yang Wang

Text classification stands as a cornerstone within the realm of Natural Language Processing (NLP), particularly when viewed through computer science and engineering. The past decade has seen deep learning revolutionize text classification,…

Computation and Language · Computer Science 2025-04-23 Marco Siino , Ilenia Tinnirello , Marco La Cascia

Few-shot learning benchmarks are critical for evaluating modern NLP techniques. It is possible, however, that benchmarks favor methods which easily make use of unlabeled text, because researchers can use unlabeled text from the test set to…

Computation and Language · Computer Science 2024-10-03 Kush Dubey

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant
‹ Prev 1 2 3 10 Next ›