English
Related papers

Related papers: Modeling Text Complexity using a Multi-Scale Probi…

200 papers

Text simplification plays a crucial role in improving the accessibility and comprehensibility of written information for diverse audiences, including language learners and readers with limited literacy. Despite its importance, large-scale,…

Computation and Language · Computer Science 2026-05-12 Kenji Hilasaca , Nouran Khallaf , Serge Sharoff

Measuring text complexity is an essential task in several fields and applications (such as NLP, semantic web, smart education, etc.). The semantic layer of text is more tacit than its syntactic structure and, as a result, calculation of…

Computation and Language · Computer Science 2019-12-03 MohammadReza Besharati , Mohammad Izadi

Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting…

Information Retrieval · Computer Science 2021-02-22 Ensieh Davoodijam , Nasser Ghadiri , Maryam Lotfi Shahreza , Fabio Rinaldi

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta

Inferring concerted changes among biological traits along an evolutionary history remains an important yet challenging problem. Besides adjusting for spurious correlation induced from the shared history, the task also requires sufficient…

Language students are most engaged while reading texts at an appropriate difficulty level. However, existing methods of evaluating text difficulty focus mainly on vocabulary and do not prioritize grammatical features, hence they do not work…

Computation and Language · Computer Science 2017-02-17 Shuhan Wang , Erik Andersen

This paper describes serial and parallel compositional models of multiple objects with part sharing. Objects are built by part-subpart compositions and expressed in terms of a hierarchical dictionary of object parts. These parts are…

Computer Vision and Pattern Recognition · Computer Science 2013-01-17 Alan L. Yuille , Roozbeh Mottaghi

Monitoring microbiological behaviors in water is crucial to manage public health risk from waterborne pathogens, although quantifying the concentrations of microbiological organisms in water is still challenging because concentrations of…

Artificial Intelligence · Computer Science 2023-02-22 Yuya Takada , Tsuyoshi Kato

Most topic models are constructed under the assumption that documents follow a multinomial distribution. The Poisson distribution is an alternative distribution to describe the probability of count data. For topic modelling, the Poisson…

Computation and Language · Computer Science 2020-04-27 Jocelyn Mazarura , Alta de Waal , Pieter de Villiers

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as…

Machine Learning · Computer Science 2014-10-30 Yaojia Zhu , Xiaoran Yan , Lise Getoor , Cristopher Moore

Existing methods for complexity estimation are typically developed for entire documents. This limitation in scope makes them inapplicable for shorter pieces of text, such as health assessment tools. These typically consist of lists of…

Computation and Language · Computer Science 2024-04-02 Sondre Wold , Petter Mæhlum , Oddbjørn Hove

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest…

Computation and Language · Computer Science 2018-02-21 Adina Williams , Nikita Nangia , Samuel R. Bowman

We analyze the complexity of Gibbs samplers for inference in crossed random effect models used in modern analysis of variance. We demonstrate that for certain designs the plain vanilla Gibbs sampler is not scalable, in the sense that its…

Computation · Statistics 2018-03-28 Omiros Papaspiliopoulos , Gareth O. Roberts , Giacomo Zanella

We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of…

Machine Learning · Statistics 2017-03-06 Yacine Jernite , Anna Choromanska , David Sontag

Modeling thematic fit (a verb--argument compositional semantics task) currently requires a very large burden of labeled data. We take a linguistically machine-annotated large corpus and replace corpus layers with output from higher-quality,…

Computation and Language · Computer Science 2022-05-05 Yuval Marton , Asad Sayeed

This paper presents an approach based on supervised machine learning methods to build a classifier that can identify text complexity in order to present Arabic language learners with texts suitable to their levels. The approach is based on…

Computation and Language · Computer Science 2021-09-20 Sadik Bessou , Ghozlane Chenni

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine…

Computation and Language · Computer Science 2018-06-12 Johannes Welbl , Pontus Stenetorp , Sebastian Riedel

Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors occur. However, understanding the…

Computation and Language · Computer Science 2018-12-10 Edward Collins , Nikolai Rozanov , Bingbing Zhang

Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to…

Methodology · Statistics 2015-03-17 Matt Taddy

Intertextuality is a central tenet in literary studies. It refers to the intricate links between literary texts that are created by various types of references. This paper proposes a new quantitative model of intertextuality to enable…

Computation and Language · Computer Science 2025-09-10 Yi Xing