Related papers: Modeling Text Complexity using a Multi-Scale Probi…

Align and Shine: Building High-Quality Sentence-Aligned Corpora for Multilingual Text Simplification

Text simplification plays a crucial role in improving the accessibility and comprehensibility of written information for diverse audiences, including language learners and readers with limited literacy. Despite its importance, large-scale,…

Computation and Language · Computer Science 2026-05-12 Kenji Hilasaca , Nouran Khallaf , Serge Sharoff

DAST Model: Deciding About Semantic Complexity of a Text

Measuring text complexity is an essential task in several fields and applications (such as NLP, semantic web, smart education, etc.). The semantic layer of text is more tacit than its syntactic structure and, as a result, calculation of…

Computation and Language · Computer Science 2019-12-03 MohammadReza Besharati , Mohammad Izadi

MultiGBS: A multi-layer graph approach to biomedical summarization

Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting…

Information Retrieval · Computer Science 2021-02-22 Ensieh Davoodijam , Nasser Ghadiri , Maryam Lotfi Shahreza , Fabio Rinaldi

Iterative Data Programming for Expanding Text Classification Corpora

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta

Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models

Inferring concerted changes among biological traits along an evolutionary history remains an important yet challenging problem. Besides adjusting for spurious correlation induced from the shared history, the task also requires sufficient…

Methodology · Statistics 2020-09-25 Zhenyu Zhang , Akihiko Nishimura , Paul Bastide , Xiang Ji , Rebecca P. Payne , Philip Goulder , Philippe Lemey , Marc A. Suchard

Grammatical Templates: Improving Text Difficulty Evaluation for Language Learners

Language students are most engaged while reading texts at an appropriate difficulty level. However, existing methods of evaluating text difficulty focus mainly on vocabulary and do not prioritize grammatical features, hence they do not work…

Computation and Language · Computer Science 2017-02-17 Shuhan Wang , Erik Andersen

Complexity of Representation and Inference in Compositional Models with Part Sharing

This paper describes serial and parallel compositional models of multiple objects with part sharing. Objects are built by part-subpart compositions and expressed in terms of a hierarchical dictionary of object parts. These parts are…

Computer Vision and Pattern Recognition · Computer Science 2013-01-17 Alan L. Yuille , Roozbeh Mottaghi

Multi-Target Tobit Models for Completing Water Quality Data

Monitoring microbiological behaviors in water is crucial to manage public health risk from waterborne pathogens, although quantifying the concentrations of microbiological organisms in water is still challenging because concentrations of…

Artificial Intelligence · Computer Science 2023-02-22 Yuya Takada , Tsuyoshi Kato

A Gamma-Poisson Mixture Topic Model for Short Text

Most topic models are constructed under the assumption that documents follow a multinomial distribution. The Poisson distribution is an alternative distribution to describe the probability of count data. For topic modelling, the Poisson…

Computation and Language · Computer Science 2020-04-27 Jocelyn Mazarura , Alta de Waal , Pieter de Villiers

Scalable Text and Link Analysis with Mixed-Topic Link Models

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as…

Machine Learning · Computer Science 2014-10-30 Yaojia Zhu , Xiaoran Yan , Lise Getoor , Cristopher Moore

Estimating Lexical Complexity from Document-Level Distributions

Existing methods for complexity estimation are typically developed for entire documents. This limitation in scope makes them inapplicable for shorter pieces of text, such as health assessment tools. These typically consist of lists of…

Computation and Language · Computer Science 2024-04-02 Sondre Wold , Petter Mæhlum , Oddbjørn Hove

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest…

Computation and Language · Computer Science 2018-02-21 Adina Williams , Nikita Nangia , Samuel R. Bowman

Scalable inference for crossed random effects models

We analyze the complexity of Gibbs samplers for inference in crossed random effect models used in modern analysis of variance. We demonstrate that for certain designs the plain vanilla Gibbs sampler is not scalable, in the sense that its…

Computation · Statistics 2018-03-28 Omiros Papaspiliopoulos , Gareth O. Roberts , Giacomo Zanella

Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of…

Machine Learning · Statistics 2017-03-06 Yacine Jernite , Anna Choromanska , David Sontag

Thematic Fit Bits: Annotation Quality and Quantity Interplay for Event Participant Representation

Modeling thematic fit (a verb--argument compositional semantics task) currently requires a very large burden of labeled data. We take a linguistically machine-annotated large corpus and replace corpus layers with output from higher-quality,…

Computation and Language · Computer Science 2022-05-05 Yuval Marton , Asad Sayeed

Efficient Measuring of Readability to Improve Documents Accessibility for Arabic Language Learners

This paper presents an approach based on supervised machine learning methods to build a classifier that can identify text complexity in order to present Arabic language learners with texts suitable to their levels. The approach is based on…

Computation and Language · Computer Science 2021-09-20 Sadik Bessou , Ghozlane Chenni

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine…

Computation and Language · Computer Science 2018-06-12 Johannes Welbl , Pontus Stenetorp , Sebastian Riedel

Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks

Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors occur. However, understanding the…

Computation and Language · Computer Science 2018-12-10 Edward Collins , Nikolai Rozanov , Bingbing Zhang

Multinomial Inverse Regression for Text Analysis

Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to…

Methodology · Statistics 2015-03-17 Matt Taddy

Modelling Intertextuality with N-gram Embeddings

Intertextuality is a central tenet in literary studies. It refers to the intricate links between literary texts that are created by various types of references. This paper proposes a new quantitative model of intertextuality to enable…

Computation and Language · Computer Science 2025-09-10 Yi Xing