Related papers: Modeling Text Complexity using a Multi-Scale Probi…

Complexity Metric for Code-Mixed Social Media Text

An evaluation metric is an absolute necessity for measuring the performance of any system and complexity of any data. In this paper, we have discussed how to determine the level of complexity of code-mixed social media texts that are…

Computation and Language · Computer Science 2017-07-06 Souvick Ghosh , Satanu Ghosh , Dipankar Das

On Bi-gram Graph Attributes

We propose a new approach to text semantic analysis and general corpus analysis using, as termed in this article, a "bi-gram graph" representation of a corpus. The different attributes derived from graph theory are measured and analyzed as…

Machine Learning · Computer Science 2021-07-30 Thomas Konstantinovsky , Matan Mizrachi

Retrieval-based Text Selection for Addressing Class-Imbalanced Data in Classification

This paper addresses the problem of selecting of a set of texts for annotation in text classification using retrieval methods when there are limits on the number of annotations due to constraints on human resources. An additional challenge…

Computation and Language · Computer Science 2023-11-13 Sareh Ahmadi , Aditya Shah , Edward Fox

Text Level Graph Neural Network for Text Classification

Recently, researches have explored the graph neural network (GNN) techniques on text classification, since GNN does well in handling complex structures and preserving global information. However, previous methods based on GNN are mainly…

Computation and Language · Computer Science 2019-10-09 Lianzhe Huang , Dehong Ma , Sujian Li , Xiaodong Zhang , Houfeng WANG

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over…

Computation and Language · Computer Science 2022-06-22 Siddhant Garg , Goutham Ramakrishnan , Varun Thumbe

Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review

Item difficulty plays a crucial role in test performance, interpretability of scores, and equity for all test-takers, especially in large-scale assessments. Traditional approaches to item difficulty modeling rely on field testing and…

Computation and Language · Computer Science 2025-09-30 Sydney Peters , Nan Zhang , Hong Jiao , Ming Li , Tianyi Zhou , Robert Lissitz

Probabilistic FastText for Multi-Sense Word Embeddings

We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the…

Computation and Language · Computer Science 2018-06-11 Ben Athiwaratkun , Andrew Gordon Wilson , Anima Anandkumar

An Enhanced Model-based Approach for Short Text Clustering

Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook. Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep…

Computation and Language · Computer Science 2025-07-21 Enhao Cheng , Shoujia Zhang , Jianhua Yin , Xuemeng Song , Tian Gan , Liqiang Nie

Multi-Label Learning with Provable Guarantee

Here we study the problem of learning labels for large text corpora where each text can be assigned a variable number of labels. The problem might seem trivial when the label dimensionality is small and can be easily solved using a series…

Machine Learning · Computer Science 2016-11-02 Sayantan Dasgupta

A Bayesian Approach to Joint Estimation of Multiple Graphical Models

The problem of joint estimation of multiple graphical models from high dimensional data has been studied in the statistics and machine learning literature, due to its importance in diverse fields including molecular biology, neuroscience…

Methodology · Statistics 2019-07-04 Peyman Jalali , Kshitij Khare , George Michailidis

Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification

Text classification is the task of automatically assigning text documents correct labels from a predefined set of categories. In real-life (text) classification tasks, observations and misclassification costs are often unevenly distributed…

Machine Learning · Computer Science 2025-09-03 Aleksi Avela , Pauliina Ilmonen

Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL

The goal of this work is to build a classifier that can identify text complexity within the context of teaching reading to English as a Second Language (ESL) learners. To present language learners with texts that are suitable to their level…

Computation and Language · Computer Science 2023-06-22 M. Zakaria Kurdi

Multi-Step Inference for Reasoning Over Paragraphs

Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives. Prior work has largely tried to do this either symbolically or with black-box transformers. We present a middle ground…

Computation and Language · Computer Science 2021-06-08 Jiangming Liu , Matt Gardner , Shay B. Cohen , Mirella Lapata

On a Class of Shrinkage Priors for Covariance Matrix Estimation

We propose a flexible class of models based on scale mixture of uniform distributions to construct shrinkage priors for covariance matrix estimation. This new class of priors enjoys a number of advantages over the traditional scale mixture…

Methodology · Statistics 2011-10-07 Hao Wang , Natesh S. Pillai

In real-world applications, as data availability increases, obtaining labeled data for machine learning (ML) projects remains challenging due to the high costs and intensive efforts required for data annotation. Many ML projects,…

Machine Learning · Computer Science 2024-12-24 Ismail Hakki Karaman , Gulser Koksal , Levent Eriskin , Salih Salihoglu

Corpus Considerations for Annotator Modeling and Scaling

Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios…

Computation and Language · Computer Science 2024-04-18 Olufunke O. Sarumi , Béla Neuendorf , Joan Plepi , Lucie Flek , Jörg Schlötterer , Charles Welch

A Case Study in Complexity Estimation: Towards Parallel Branch-and-Bound over Graphical Models

We study the problem of complexity estimation in the context of parallelizing an advanced Branch and Bound-type algorithm over graphical models. The algorithm's pruning power makes load balancing, one crucial element of every distributed…

Artificial Intelligence · Computer Science 2012-10-19 Lars Otten , Rina Dechter

Text Classification Algorithms: A Survey

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine…

Machine Learning · Computer Science 2020-05-21 Kamran Kowsari , Kiana Jafari Meimandi , Mojtaba Heidarysafa , Sanjana Mendu , Laura E. Barnes , Donald E. Brown

Ordered Semantically Diverse Sampling for Textual Data

The goal of diversity sampling is to select a representative subset of data in a way that maximizes information contained in the subset while keeping its cardinality small. We introduce the ordered diverse sampling problem based on a new…

Computation and Language · Computer Science 2025-03-17 Ashish Tiwari , Mukul Singh , Ananya Singha , Arjun Radhakrishna

Multinomial probit model based on joint quantile regression

The multinomial probit model is a typical statistical model for multiple-choice data applied in many research areas. When we are interested in some quantiles of relative utilities for understanding the distribution of these utilities, the…

Methodology · Statistics 2025-08-20 Masaaki Okabe , Koki Matsuoka , Jun Tsuchida , Hiroshi Yadohisa