English
Related papers

Related papers: Modeling Text Complexity using a Multi-Scale Probi…

200 papers

In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Chuhan Zhang , Ankush Gupta , Andrew Zisserman

We present the results of a study of definite descriptions use in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects…

cmp-lg · Computer Science 2007-05-23 Massimo Poesio , Renata Vieira

We address the challenge of incorporating document-level metadata into topic modeling to improve topic mixture estimation. To overcome the computational complexity and lack of theoretical guarantees in existing Bayesian methods, we extend…

Machine Learning · Computer Science 2025-03-18 Yeo Jin Jung , Claire Donnat

Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by…

Computation and Language · Computer Science 2020-11-25 Yekun Chai , Haidong Zhang , Shuo Jin

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling…

Machine Learning · Computer Science 2014-06-20 Derek Greene , Derek O'Callaghan , Pádraig Cunningham

Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for…

Computation and Language · Computer Science 2023-10-26 Daniel Atzberger , Tim Cech , Willy Scheibel , Matthias Trapp , Rico Richter , Jürgen Döllner , Tobias Schreck

We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for…

Computation and Language · Computer Science 2007-05-23 Regina Barzilay , Lillian Lee

NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial…

Computation and Language · Computer Science 2021-10-12 Daniela Brook Weiss , Paul Roit , Ori Ernst , Ido Dagan

Several complex systems are characterized by presenting intricate characteristics taking place at several scales of time and space. These multiscale characterizations are used in various applications, including better understanding…

Computation and Language · Computer Science 2023-05-12 Bárbara C. e Souza , Filipi N. Silva , Henrique F. de Arruda , Giovana D. da Silva , Luciano da F. Costa , Diego R. Amancio

Text simplification (TS) systems rewrite text to make it more readable while preserving its content. However, what makes a text easy to read depends on the intended readers. Recent work has shown that pre-trained language models can…

Computation and Language · Computer Science 2023-12-01 Sweta Agrawal , Marine Carpuat

Text corpora are widely used resources for measuring societal biases and stereotypes. The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the…

Computation and Language · Computer Science 2021-04-28 Navid Rekabsaz , Robert West , James Henderson , Allan Hanbury

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling,…

Machine Learning · Computer Science 2012-05-14 Arthur Asuncion , Max Welling , Padhraic Smyth , Yee Whye Teh

Topic models are a family of statistical-based algorithms to summarize, explore and index large collections of text documents. After a decade of research led by computer scientists, topic models have spread to social science as a new…

Computation and Language · Computer Science 2018-04-04 Ryan Wesslen

As the probability (and thus perplexity) of a text is calculated based on the product of the probabilities of individual tokens, it may happen that one unlikely token significantly reduces the probability (i.e., increase the perplexity) of…

Computation and Language · Computer Science 2023-07-19 Mihailo Škorić

Finite mixture models are frequently used to uncover latent structures in high-dimensional datasets (e.g.\ identifying clusters of patients in electronic health records). The inference of such structures can be performed in a Bayesian…

Compound nouns such as example noun compound are becoming more common in natural language and pose a number of difficult problems for NLP systems, notably increasing the complexity of parsing. In this paper we develop a probabilistic model…

cmp-lg · Computer Science 2008-02-03 Mark Lauer , Mark Dras

Multilabel classification is an emergent data mining task with a broad range of real world applications. Learning from imbalanced multilabel data is being deeply studied latterly, and several resampling methods have been proposed in the…

Machine Learning · Computer Science 2018-02-15 Francisco Charte , Antonio J. Rivera , María J. del Jesus , Francisco Herrera

This paper studies a text classification algorithm based on an improved Transformer to improve the performance and efficiency of the model in text classification tasks. Aiming at the shortcomings of the traditional Transformer model in…

Computation and Language · Computer Science 2025-01-24 Jia Gao , Guiran Liu , Binrong Zhu , Shicheng Zhou , Hongye Zheng , Xiaoxuan Liao

Recently, there has been considerable progress on designing algorithms with provable guarantees -- typically using linear algebraic methods -- for parameter learning in latent variable models. But designing provable algorithms for inference…

Machine Learning · Computer Science 2016-05-30 Sanjeev Arora , Rong Ge , Frederic Koehler , Tengyu Ma , Ankur Moitra

This paper describes a method for providing feedback about the degree of complexity that is present in particular texts. Both the method and the software tool called TexComp are designed for use during the assessment of student compositions…

Computers and Society · Computer Science 2012-06-29 T. Kakkonen
‹ Prev 1 3 4 5 6 7 10 Next ›