English
Related papers

Related papers: Modeling Text Complexity using a Multi-Scale Probi…

200 papers

We investigate the integration of word embeddings as classification features in the setting of large scale text classification. Such representations have been used in a plethora of tasks, however their application in classification…

Computation and Language · Computer Science 2016-06-22 Georgios Balikas , Massih-Reza Amini

Measures of textual similarity and divergence are increasingly used to study cultural change. But which measures align, in practice, with social evidence about change? We apply three different representations of text (topic models, document…

Computation and Language · Computer Science 2024-11-25 Sarah Griebel , Becca Cohen , Lucian Li , Jaihyun Park , Jiayu Liu , Jana Perkins , Ted Underwood

Combining the representations of the words that make up a sentence into a cohesive whole is difficult, since it needs to account for the order of words, and to establish how the words present relate to each other. The solution we propose…

Computation and Language · Computer Science 2021-03-04 Diego Maupomé , Marie-Jean Meurs

Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a…

Spreadsheets are widely used in industry, even for critical business processes. This implies the need for proper risk assessment in spreadsheets to evaluate the reliability and validity of the spreadsheet's outcome. As related research has…

Software Engineering · Computer Science 2017-04-06 Thomas Reschenhofer , Bernhard Waltl , Klym Shumaiev , Florian Matthes

We study the convergence properties of the Gibbs Sampler in the context of posterior distributions arising from Bayesian analysis of conditionally Gaussian hierarchical models. We develop a multigrid approach to derive analytic expressions…

Computation · Statistics 2019-06-27 Giacomo Zanella , Gareth Roberts

This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit…

Computation and Language · Computer Science 2022-12-01 Souvik Banerjee , Bamdev Mishra , Pratik Jawanpuria , Manish Shrivastava

We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing…

Computation and Language · Computer Science 2023-10-24 Yu Zhang , Shweta Garg , Yu Meng , Xiusi Chen , Jiawei Han

Many text corpora exhibit socially problematic biases, which can be propagated or amplified in the models trained on such data. For example, doctor cooccurs more frequently with male pronouns than female pronouns. In this study we (i)…

Computation and Language · Computer Science 2019-04-08 Shikha Bordia , Samuel R. Bowman

Pre-trained language models have achieved noticeable performance on the intent detection task. However, due to assigning an identical weight to each sample, they suffer from the overfitting of simple samples and the failure to learn complex…

Computation and Language · Computer Science 2021-08-25 Yantao Gong , Cao Liu , Jiazhen Yuan , Fan Yang , Xunliang Cai , Guanglu Wan , Jiansong Chen , Ruiyao Niu , Houfeng Wang

Cross-modal text-molecule retrieval task bridges molecule structures and natural language descriptions. Existing methods predominantly focus on aligning text modality and molecule modality, yet they overlook adaptively adjusting the…

Computation and Language · Computer Science 2025-02-18 Hongyan Wu , Peijian Zeng , Weixiong Zheng , Lianxi Wang , Nankai Lin , Shengyi Jiang , Aimin Yang

The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks. However, in real-world applications, label noise…

Computation and Language · Computer Science 2022-10-14 Dan Qiao , Chenchen Dai , Yuyang Ding , Juntao Li , Qiang Chen , Wenliang Chen , Min Zhang

Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point. However, as natural language processing methods flourish, there are still insufficient characteristic metrics to…

Computation and Language · Computer Science 2020-03-20 Yi-An Lai , Xuan Zhu , Yi Zhang , Mona Diab

We propose a new approach to multi-factor classification of natural language texts based on weighted structured patterns such as N-grams, taking into account the heterarchical relationships between them, applied to solve such a socially…

Computation and Language · Computer Science 2025-11-11 Anton Kolonin , Anna Arinicheva

Bayesian shrinkage methods have generated a lot of recent interest as tools for high-dimensional regression and model selection. These methods naturally facilitate tractable uncertainty quantification and incorporation of prior information.…

Computation · Statistics 2017-04-17 Bala Rajaratnam , Doug Sparks , Kshitij Khare , Liyuan Zhang

This paper introduces a new methodology for the complexity analysis of higher-order functional programs, which is based on three components: a powerful type system for size analysis and a sound type inference procedure for it, a ticking…

Logic in Computer Science · Computer Science 2017-04-20 Martin Avanzini , Ugo Dal Lago

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document…

Information Retrieval · Computer Science 2014-01-16 Harr Chen , S. R. K. Branavan , Regina Barzilay , David R. Karger

When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in…

Information Retrieval · Computer Science 2015-05-18 Giancarlo Crocetti

The availability of parallel sentence simplification (SS) is scarce for neural SS modelings. We propose an unsupervised method to build SS corpora from large-scale bilingual translation corpora, alleviating the need for SS supervised…

Computation and Language · Computer Science 2021-09-02 Xinyu Lu , Jipeng Qiang , Yun Li , Yunhao Yuan , Yi Zhu

Machine learning-based multi-label medical text classifications can be used to enhance the understanding of the human body and aid the need for patient care. We present a broad study on clinical natural language processing techniques to…

Information Retrieval · Computer Science 2020-04-02 Vithya Yogarajan , Jacob Montiel , Tony Smith , Bernhard Pfahringer