English
Related papers

Related papers: Modeling Text Complexity using a Multi-Scale Probi…

200 papers

This paper introduces a new methodology for the complexity analysis of higher-order functional programs, which is based on three ingredients: a powerful type system for size analysis and a sound type inference procedure for it, a ticking…

Logic in Computer Science · Computer Science 2017-06-29 Martin Avanzini , Ugo Dal Lago

Although Bayesian density estimation using discrete mixtures has good performance in modest dimensions, there is a lack of statistical and computational scalability to high-dimensional multivariate cases. To combat the curse of…

Methodology · Statistics 2014-10-29 Ye Wang , Antonio Canale , David Dunson

Over the years, topic models have provided an efficient way of extracting insights from text. However, while many models have been proposed, none are able to model topic temporality and hierarchy jointly. Modelling time provide more precise…

Information Retrieval · Computer Science 2023-01-25 Judicael Poumay , Ashwin Ittoo

Curriculum learning provides a systematic approach to training. It refines training progressively, tailors training to task requirements, and improves generalization through exposure to diverse examples. We present a curriculum learning…

Computation and Language · Computer Science 2023-11-23 Nidhi Vakil , Hadi Amiri

Mixture models are flexible tools in density estimation and classification problems. Bayesian estimation of such models typically relies on sampling from the posterior distribution using Markov chain Monte Carlo. Label switching arises…

Applications · Statistics 2014-03-11 Wanchuang Zhu , Yanan Fan

Recent text-based causal methods attempt to mitigate confounding bias by estimating proxies of confounding variables that are partially or imperfectly measured from unstructured text data. These approaches, however, assume analysts have…

Computation and Language · Computer Science 2024-10-30 Jacob M. Chen , Rohit Bhattacharya , Katherine A. Keith

Topic modeling of textual corpora is an important and challenging problem. In most previous work, the "bag-of-words" assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it…

Machine Learning · Computer Science 2015-02-13 Min Yang , Tianyi Cui , Wenting Tu

As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original…

Information Retrieval · Computer Science 2017-08-14 Suthee Chaidaroon , Yi Fang

Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models. However, existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments.…

Computation and Language · Computer Science 2021-07-30 Laura Vásquez-Rodríguez , Matthew Shardlow , Piotr Przybyła , Sophia Ananiadou

In the realm of statistical learning, the increasing volume of accessible data and increasing model complexity necessitate robust methodologies. This paper explores two branches of robust Bayesian methods in response to this trend. The…

Methodology · Statistics 2024-12-02 Masahiro Tanaka

Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects.…

Computation and Language · Computer Science 2020-05-05 Katherine A. Keith , David Jensen , Brendan O'Connor

The histogram method is a powerful non-parametric approach for estimating the probability density function of a continuous variable. But the construction of a histogram, compared to the parametric approaches, demands a large number of…

Machine Learning · Statistics 2015-12-29 Hideaki Kim , Hiroshi Sawada

Fitted probabilities from widely used Bayesian multinomial probit models can depend strongly on the choice of a base category, which is used to uniquely identify the parameters of the model. This paper proposes a novel identification…

Methodology · Statistics 2020-05-19 Lane F. Burgette , David Puelz , P. Richard Hahn

We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple…

Computation and Language · Computer Science 2022-05-05 Sheshera Mysore , Arman Cohan , Tom Hope

Topic models are popular statistical tools for detecting latent semantic topics in a text corpus. They have been utilized in various applications across different fields. However, traditional topic models have some limitations, including…

Computation and Language · Computer Science 2023-10-10 Pritom Saha Akash , Trisha Das , Kevin Chen-Chuan Chang

The multivariate probit is popular for modeling correlated binary data, with an attractive balance of flexibility and simplicity. However, considerable challenges remain in computation and in devising a clear statistical framework. Interest…

Methodology · Statistics 2020-04-22 Bryan W. Ting , Fred A. Wright , Yi-Hui Zhou

We present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of…

Information Retrieval · Computer Science 2011-10-27 Fidelia Ibekwe-Sanjuan , Fernandez Silvia , Sanjuan Eric , Charton Eric

The Chapter starts with introductory information about quantitative linguistics notions, like rank--frequency dependence, Zipf's law, frequency spectra, etc. Similarities in distributions of words in texts with level occupation in quantum…

Data Analysis, Statistics and Probability · Physics 2024-01-04 Andrij Rovenchak

Deep learning models often learn and exploit spurious correlations in training data, using these non-target features to inform their predictions. Such reliance leads to performance degradation and poor generalization on unseen data. To…

Computation and Language · Computer Science 2025-11-21 Kyohoon Jin , Juhwan Choi , Jungmin Yun , Junho Lee , Soojin Jang , Youngbin Kim

Text readability assessment has a wide range of applications for different target people, from language learners to people with disabilities. The fast pace of textual content production on the web makes it impossible to measure text…

Computation and Language · Computer Science 2022-09-07 Salar Mohtaj , Babak Naderi , Sebastian Möller , Faraz Maschhur , Chuyang Wu , Max Reinhard