Related papers: Analyzing Text Representations by Measuring Task A…

Analyzing Text Representations under Tight Annotation Budgets: Measuring Structural Alignment

Annotating large collections of textual data can be time consuming and expensive. That is why the ability to train models with limited annotation budgets is of great importance. In this context, it has been shown that under tight annotation…

Computation and Language · Computer Science 2022-10-13 César González-Gutiérrez , Audi Primadhanty , Francesco Cazzaro , Ariadna Quattoni

Mimicking Human Process: Text Representation via Latent Semantic Clustering for Classification

Considering that words with different characteristic in the text have different importance for classification, grouping them together separately can strengthen the semantic expression of each part. Thus we propose a new text representation…

Computation and Language · Computer Science 2019-06-19 Xiaoye Tan , Rui Yan , Chongyang Tao , Mingrui Wu

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Autoregressive language models, pretrained using large text corpora to do well on next word prediction, have been successful at solving many downstream tasks, even with zero-shot usage. However, there is little theoretical understanding of…

Computation and Language · Computer Science 2021-04-15 Nikunj Saunshi , Sadhika Malladi , Sanjeev Arora

Neural Text Classification by Jointly Learning to Cluster and Align

Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by…

Computation and Language · Computer Science 2020-11-25 Yekun Chai , Haidong Zhang , Shuo Jin

Classifying text using machine learning models and determining conversation drift

Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their…

Machine Learning · Computer Science 2022-11-16 Chaitanya Chadha , Vandit Gupta , Deepak Gupta , Ashish Khanna

Multilevel Text Alignment with Cross-Document Attention

Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document…

Computation and Language · Computer Science 2020-10-06 Xuhui Zhou , Nikolaos Pappas , Noah A. Smith

Investigating the Working of Text Classifiers

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

Towards a Learning Theory of Representation Alignment

It has recently been argued that AI models' representations are becoming aligned as their scale and performance increase. Empirical analyses have been designed to support this idea and conjecture the possible alignment of different…

Machine Learning · Computer Science 2025-02-21 Francesco Insulla , Shuo Huang , Lorenzo Rosasco

A Cross-Task Analysis of Text Span Representations

Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for…

Computation and Language · Computer Science 2020-06-09 Shubham Toshniwal , Haoyue Shi , Bowen Shi , Lingyu Gao , Karen Livescu , Kevin Gimpel

Text Classification Algorithms: A Survey

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine…

Machine Learning · Computer Science 2020-05-21 Kamran Kowsari , Kiana Jafari Meimandi , Mojtaba Heidarysafa , Sanjana Mendu , Laura E. Barnes , Donald E. Brown

Text Classification with Few Examples using Controlled Generalization

Training data for text classification is often limited in practice, especially for applications with many output classes or involving many related classification problems. This means classifiers must generalize from limited evidence, but…

Computation and Language · Computer Science 2020-05-19 Abhijit Mahabal , Jason Baldridge , Burcu Karagol Ayan , Vincent Perot , Dan Roth

Conditional Supervised Contrastive Learning for Fair Text Classification

Contrastive representation learning has gained much attention due to its superior performance in learning representations from both image and sequential data. However, the learned representations could potentially lead to performance…

Computation and Language · Computer Science 2022-11-01 Jianfeng Chi , William Shand , Yaodong Yu , Kai-Wei Chang , Han Zhao , Yuan Tian

Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks

In hierarchical text classification, we perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy. Most of the studies have focused on developing novels neural network…

Computation and Language · Computer Science 2020-05-25 Kervy Rivas Rojas , Gina Bustamante , Arturo Oncevay , Marco A. Sobrevilla Cabezudo

Cluster Based Symbolic Representation for Skewed Text Categorization

In this work, a problem associated with imbalanced text corpora is addressed. A method of converting an imbalanced text corpus into a balanced one is presented. The presented method employs a clustering algorithm for conversion. Initially…

Information Retrieval · Computer Science 2017-06-27 Lavanya Narayana Raju , Mahamad Suhil , D S Guru , Harsha S Gowda

Dynamic Reflections: Probing Video Representations with Text Alignment

The alignment of representations from different modalities has recently been shown to provide insights on the structural similarities and downstream capabilities of different encoders across diverse data types. While significant progress…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Tyler Zhu , Tengda Han , Leonidas Guibas , Viorica Pătrăucean , Maks Ovsjanikov

Many-Class Text Classification with Matching

In this work, we formulate \textbf{T}ext \textbf{C}lassification as a \textbf{M}atching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches,…

Computation and Language · Computer Science 2022-05-24 Yi Song , Yuxian Gu , Minlie Huang

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences.…

Computation and Language · Computer Science 2017-09-07 Miriam Cha , Youngjune Gwon , H. T. Kung

Quantifying the effect of representations on task complexity

We examine the influence of input data representations on learning complexity. For learning, we posit that each model implicitly uses a candidate model distribution for unexplained variations in the data, its noise model. If the model…

Machine Learning · Computer Science 2019-12-21 Julian Zilly , Lorenz Hetzel , Andrea Censi , Emilio Frazzoli

Text Classification and Clustering with Annealing Soft Nearest Neighbor Loss

We define disentanglement as how far class-different data points from each other are, relative to the distances among class-similar data points. When maximizing disentanglement during representation learning, we obtain a transformed feature…

Machine Learning · Computer Science 2021-08-02 Abien Fred Agarap

Concept Matching for Low-Resource Classification

We propose a model to tackle classification tasks in the presence of very little training data. To this aim, we approximate the notion of exact match with a theoretically sound mechanism that computes a probability of matching in the input…

Machine Learning · Computer Science 2020-06-02 Federico Errica , Ludovic Denoyer , Bora Edizel , Fabio Petroni , Vassilis Plachouras , Fabrizio Silvestri , Sebastian Riedel