Related papers: Modeling Text Complexity using a Multi-Scale Probi…

Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated…

Computation and Language · Computer Science 2013-11-12 Ran El-Yaniv , David Yanay

Poisson Multi-Bernoulli Mapping Using Gibbs Sampling

This paper addresses the mapping problem. Using a conjugate prior form, we derive the exact theoretical batch multi-object posterior density of the map given a set of measurements. The landmarks in the map are modeled as extended objects,…

Machine Learning · Statistics 2018-11-09 Maryam Fatemi , Karl Granström , Lennart Svensson , Francisco J. R. Ruiz , Lars Hammarstrand

Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts

Aspect-based summarization has seen significant advancements, especially in structured text. Yet, summarizing disordered, large-scale texts, like those found in social media and customer feedback, remains a significant challenge. Current…

Computation and Language · Computer Science 2024-06-19 Xiaobo Guo , Soroush Vosoughi

A Model for Fine-Grained Alignment of Multilingual Texts

While alignment of texts on the sentential level is often seen as being too coarse, and word alignment as being too fine-grained, bi- or multilingual texts which are aligned on a level in-between are a useful resource for many purposes.…

Computation and Language · Computer Science 2007-05-23 Lea Cyrus , Hendrik Feddes

Corpora Evaluation and System Bias Detection in Multi-document Summarization

Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to…

Computation and Language · Computer Science 2020-10-06 Alvin Dey , Tanya Chowdhury , Yash Kumar Atri , Tanmoy Chakraborty

Explainable Prediction of Text Complexity: The Missing Preliminaries for Text Simplification

Text simplification reduces the language complexity of professional content for accessibility purposes. End-to-end neural network models have been widely adopted to directly generate the simplified version of input text, usually functioning…

Computation and Language · Computer Science 2021-07-08 Cristina Garbacea , Mengtian Guo , Samuel Carton , Qiaozhu Mei

Multi-Object Posterior Computation via Gibbs Sampling

This work presents a tractable approach to multi-object posterior computation under a generic measurement likelihood function. While filtering is a popular solution, valuable historical information is discarded. Posterior inference, which…

Computation · Statistics 2026-04-15 Ba Tuong Vo , Ba-Ngu Vo

Improving Probabilistic Models in Text Classification via Active Learning

Social scientists often classify text documents to use the resulting labels as an outcome or a predictor in empirical research. Automated text classification has become a standard tool, since it requires less human coding. However, scholars…

Computation and Language · Computer Science 2025-05-14 Mitchell Bosley , Saki Kuzushima , Ted Enamorado , Yuki Shiraito

Efficient Classification of Long Documents Using Transformers

Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a…

Computation and Language · Computer Science 2022-03-23 Hyunji Hayley Park , Yogarshi Vyas , Kashif Shah

A corpus of precise natural textual entailment problems

In this paper, we present a new corpus of entailment problems. This corpus combines the following characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on "real-world" texts (i.e. most of the premises…

Computation and Language · Computer Science 2018-12-17 Jean-Philippe Bernardy , Stergios Chatzikyriakidis

A Gibbs posterior sampler for inverse problem based on prior diffusion model

This paper addresses the issue of inversion in cases where (1) the observation system is modeled by a linear transformation and additive noise, (2) the problem is ill-posed and regularization is introduced in a Bayesian framework by an a…

Machine Learning · Statistics 2026-02-12 Jean-François Giovannelli

Corpus Statistics in Text Classification of Online Data

Transformation of Machine Learning (ML) from a boutique science to a generally accepted technology has increased importance of reproduction and transportability of ML studies. In the current work, we investigate how corpus characteristics…

Computation and Language · Computer Science 2018-03-20 Marina Sokolova , Victoria Bobicev

MetaLDA: a Topic Model that Efficiently Incorporates Meta information

Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating…

Computation and Language · Computer Science 2017-09-20 He Zhao , Lan Du , Wray Buntine , Gang Liu

Investigating the Working of Text Classifiers

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

Sentence Ambiguity, Grammaticality and Complexity Probes

It is unclear whether, how and where large pre-trained language models capture subtle linguistic traits like ambiguity, grammaticality and sentence complexity. We present results of automatic classification of these traits and compare their…

Computation and Language · Computer Science 2022-10-18 Sunit Bhattacharya , Vilém Zouhar , Ondřej Bojar

Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes.…

Machine Learning · Computer Science 2020-12-21 Zhao Wang , Aron Culotta

Identifying Semantically Difficult Samples to Improve Text Classification

In this paper, we investigate the effect of addressing difficult samples from a given text dataset on the downstream text classification task. We define difficult samples as being non-obvious cases for text classification by analysing them…

Computation and Language · Computer Science 2023-02-14 Shashank Mujumdar , Stuti Mehta , Hima Patel , Suman Mitra

Inference and Evaluation of the Multinomial Mixture Model for Text Clustering

In this article, we investigate the use of a probabilistic model for unsupervised clustering in text collections. Unsupervised clustering has become a basic module for many intelligent text processing applications, such as information…

Information Retrieval · Computer Science 2016-08-16 Loïs Rigouste , Olivier Cappé , François Yvon

Controlling Text Complexity in Neural Machine Translation

This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency. We collect a high quality dataset of news articles available in English and Spanish, written for…

Computation and Language · Computer Science 2019-11-05 Sweta Agrawal , Marine Carpuat

Text2Struct: A Machine Learning Pipeline for Mining Structured Data from Text

Many analysis and prediction tasks require the extraction of structured data from unstructured texts. However, an annotation scheme and a training dataset have not been available for training machine learning models to mine structured data…

Information Retrieval · Computer Science 2025-06-24 Chaochao Zhou , Bo Yang