Related papers: Modeling Text Complexity using a Multi-Scale Probi…

Automating Sized Type Inference for Complexity Analysis (Technical Report)

This paper introduces a new methodology for the complexity analysis of higher-order functional programs, which is based on three ingredients: a powerful type system for size analysis and a sound type inference procedure for it, a ticking…

Logic in Computer Science · Computer Science 2017-06-29 Martin Avanzini , Ugo Dal Lago

Scalable multiscale density estimation

Although Bayesian density estimation using discrete mixtures has good performance in modest dimensions, there is a lack of statistical and computational scalability to high-dimensional multivariate cases. To combat the curse of…

Methodology · Statistics 2014-10-29 Ye Wang , Antonio Canale , David Dunson

HTMOT : Hierarchical Topic Modelling Over Time

Over the years, topic models have provided an efficient way of extracting insights from text. However, while many models have been proposed, none are able to model topic temporality and hierarchy jointly. Modelling time provide more precise…

Information Retrieval · Computer Science 2023-01-25 Judicael Poumay , Ashwin Ittoo

Complexity-Guided Curriculum Learning for Text Graphs

Curriculum learning provides a systematic approach to training. It refines training progressively, tailors training to task requirements, and improves generalization through exposure to diverse examples. We present a curriculum learning…

Computation and Language · Computer Science 2023-11-23 Nidhi Vakil , Hadi Amiri

Relabelling Algorithms for Large Dataset Mixture Models

Mixture models are flexible tools in density estimation and classification problems. Bayesian estimation of such models typically relies on sampling from the posterior distribution using Markov chain Monte Carlo. Label switching arises…

Applications · Statistics 2014-03-11 Wanchuang Zhu , Yanan Fan

Proximal Causal Inference With Text Data

Recent text-based causal methods attempt to mitigate confounding bias by estimating proxies of confounding variables that are partially or imperfectly measured from unstructured text data. These approaches, however, assume analysts have…

Computation and Language · Computer Science 2024-10-30 Jacob M. Chen , Rohit Bhattacharya , Katherine A. Keith

Ordering-sensitive and Semantic-aware Topic Modeling

Topic modeling of textual corpora is an important and challenging problem. In most previous work, the "bag-of-words" assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it…

Machine Learning · Computer Science 2015-02-13 Min Yang , Tianyi Cui , Wenting Tu

Variational Deep Semantic Hashing for Text Documents

As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original…

Information Retrieval · Computer Science 2017-08-14 Suthee Chaidaroon , Yi Fang

Investigating Text Simplification Evaluation

Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models. However, existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments.…

Computation and Language · Computer Science 2021-07-30 Laura Vásquez-Rodríguez , Matthew Shardlow , Piotr Przybyła , Sophia Ananiadou

Weighted Particle-Based Optimization for Efficient Generalized Posterior Calibration

In the realm of statistical learning, the increasing volume of accessible data and increasing model complexity necessitate robust methodologies. This paper explores two branches of robust Bayesian methods in response to this trend. The…

Methodology · Statistics 2024-12-02 Masahiro Tanaka

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates

Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects.…

Computation and Language · Computer Science 2020-05-05 Katherine A. Keith , David Jensen , Brendan O'Connor

Histogram Meets Topic Model: Density Estimation by Mixture of Histograms

The histogram method is a powerful non-parametric approach for estimating the probability density function of a continuous variable. But the construction of a histogram, compared to the parametric approaches, demands a large number of…

Machine Learning · Statistics 2015-12-29 Hideaki Kim , Hiroshi Sawada

A Symmetric Prior for Multinomial Probit Models

Fitted probabilities from widely used Bayesian multinomial probit models can depend strongly on the choice of a base category, which is used to uniquely identify the parameters of the model. This paper proposes a novel identification…

Methodology · Statistics 2020-05-19 Lane F. Burgette , David Puelz , P. Richard Hahn

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple…

Computation and Language · Computer Science 2022-05-05 Sheshera Mysore , Arman Cohan , Tom Hope

TopicAdapt- An Inter-Corpora Topics Adaptation Approach

Topic models are popular statistical tools for detecting latent semantic topics in a text corpus. They have been utilized in various applications across different fields. However, traditional topic models have some limitations, including…

Computation and Language · Computer Science 2023-10-10 Pritom Saha Akash , Trisha Das , Kevin Chen-Chuan Chang

Fast Multivariate Probit Estimation via a Two-Stage Composite Likelihood

The multivariate probit is popular for modeling correlated binary data, with an attractive balance of flexibility and simplicity. However, considerable challenges remain in computation and in devising a clear statistical framework. Interest…

Methodology · Statistics 2020-04-22 Bryan W. Ting , Fred A. Wright , Yi-Hui Zhou

Annotation of Scientific Summaries for Information Retrieval

We present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of…

Information Retrieval · Computer Science 2011-10-27 Fidelia Ibekwe-Sanjuan , Fernandez Silvia , Sanjuan Eric , Charton Eric

Approaches to the classification of complex systems: Words, texts, and more

The Chapter starts with introductory information about quantitative linguistics notions, like rank--frequency dependence, Zipf's law, frequency spectra, etc. Similarities in distributions of words in texts with level occupation in quantum…

Data Analysis, Statistics and Probability · Physics 2024-01-04 Andrij Rovenchak

CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples

Deep learning models often learn and exploit spurious correlations in training data, using these non-target features to inform their predictions. Such reliance leads to performance degradation and poor generalization on unseen data. To…

Computation and Language · Computer Science 2025-11-21 Kyohoon Jin , Juhwan Choi , Jungmin Yun , Junho Lee , Soojin Jang , Youngbin Kim

A Transfer Learning Based Model for Text Readability Assessment in German

Text readability assessment has a wide range of applications for different target people, from language learners to people with disabilities. The fast pace of textual content production on the web makes it impossible to measure text…

Computation and Language · Computer Science 2022-09-07 Salar Mohtaj , Babak Naderi , Sebastian Möller , Faraz Maschhur , Chuyang Wu , Max Reinhard