English
Related papers

Related papers: Modeling Text Complexity using a Multi-Scale Probi…

200 papers

We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated…

Computation and Language · Computer Science 2013-11-12 Ran El-Yaniv , David Yanay

This paper addresses the mapping problem. Using a conjugate prior form, we derive the exact theoretical batch multi-object posterior density of the map given a set of measurements. The landmarks in the map are modeled as extended objects,…

Machine Learning · Statistics 2018-11-09 Maryam Fatemi , Karl Granström , Lennart Svensson , Francisco J. R. Ruiz , Lars Hammarstrand

Aspect-based summarization has seen significant advancements, especially in structured text. Yet, summarizing disordered, large-scale texts, like those found in social media and customer feedback, remains a significant challenge. Current…

Computation and Language · Computer Science 2024-06-19 Xiaobo Guo , Soroush Vosoughi

While alignment of texts on the sentential level is often seen as being too coarse, and word alignment as being too fine-grained, bi- or multilingual texts which are aligned on a level in-between are a useful resource for many purposes.…

Computation and Language · Computer Science 2007-05-23 Lea Cyrus , Hendrik Feddes

Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to…

Computation and Language · Computer Science 2020-10-06 Alvin Dey , Tanya Chowdhury , Yash Kumar Atri , Tanmoy Chakraborty

Text simplification reduces the language complexity of professional content for accessibility purposes. End-to-end neural network models have been widely adopted to directly generate the simplified version of input text, usually functioning…

Computation and Language · Computer Science 2021-07-08 Cristina Garbacea , Mengtian Guo , Samuel Carton , Qiaozhu Mei

This work presents a tractable approach to multi-object posterior computation under a generic measurement likelihood function. While filtering is a popular solution, valuable historical information is discarded. Posterior inference, which…

Computation · Statistics 2026-04-15 Ba Tuong Vo , Ba-Ngu Vo

Social scientists often classify text documents to use the resulting labels as an outcome or a predictor in empirical research. Automated text classification has become a standard tool, since it requires less human coding. However, scholars…

Computation and Language · Computer Science 2025-05-14 Mitchell Bosley , Saki Kuzushima , Ted Enamorado , Yuki Shiraito

Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a…

Computation and Language · Computer Science 2022-03-23 Hyunji Hayley Park , Yogarshi Vyas , Kashif Shah

In this paper, we present a new corpus of entailment problems. This corpus combines the following characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on "real-world" texts (i.e. most of the premises…

Computation and Language · Computer Science 2018-12-17 Jean-Philippe Bernardy , Stergios Chatzikyriakidis

This paper addresses the issue of inversion in cases where (1) the observation system is modeled by a linear transformation and additive noise, (2) the problem is ill-posed and regularization is introduced in a Bayesian framework by an a…

Machine Learning · Statistics 2026-02-12 Jean-François Giovannelli

Transformation of Machine Learning (ML) from a boutique science to a generally accepted technology has increased importance of reproduction and transportability of ML studies. In the current work, we investigate how corpus characteristics…

Computation and Language · Computer Science 2018-03-20 Marina Sokolova , Victoria Bobicev

Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating…

Computation and Language · Computer Science 2017-09-20 He Zhao , Lan Du , Wray Buntine , Gang Liu

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

It is unclear whether, how and where large pre-trained language models capture subtle linguistic traits like ambiguity, grammaticality and sentence complexity. We present results of automatic classification of these traits and compare their…

Computation and Language · Computer Science 2022-10-18 Sunit Bhattacharya , Vilém Zouhar , Ondřej Bojar

Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes.…

Machine Learning · Computer Science 2020-12-21 Zhao Wang , Aron Culotta

In this paper, we investigate the effect of addressing difficult samples from a given text dataset on the downstream text classification task. We define difficult samples as being non-obvious cases for text classification by analysing them…

Computation and Language · Computer Science 2023-02-14 Shashank Mujumdar , Stuti Mehta , Hima Patel , Suman Mitra

In this article, we investigate the use of a probabilistic model for unsupervised clustering in text collections. Unsupervised clustering has become a basic module for many intelligent text processing applications, such as information…

Information Retrieval · Computer Science 2016-08-16 Loïs Rigouste , Olivier Cappé , François Yvon

This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency. We collect a high quality dataset of news articles available in English and Spanish, written for…

Computation and Language · Computer Science 2019-11-05 Sweta Agrawal , Marine Carpuat

Many analysis and prediction tasks require the extraction of structured data from unstructured texts. However, an annotation scheme and a training dataset have not been available for training machine learning models to mine structured data…

Information Retrieval · Computer Science 2025-06-24 Chaochao Zhou , Bo Yang