English
Related papers

Related papers: MoNoise: Modeling Noise Using a Modular Normalizat…

200 papers

Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot…

Computation and Language · Computer Science 2019-04-15 Ismini Lourentzou , Kabir Manghnani , ChengXiang Zhai

We propose an extended framework for marginalized domain adaptation, aimed at addressing unsupervised, supervised and semi-supervised scenarios. We argue that the denoising principle should be extended to explicitly promote domain-invariant…

Computer Vision and Pattern Recognition · Computer Science 2017-02-21 Gabriela Csurka , Boris Chidlovski , Stephane Clinchant , Sophia Michel

We propose a novel memory-modular learner for image classification that separates knowledge memorization from reasoning. Our model enables effective generalization to new classes by simply replacing the memory contents, without the need for…

Computer Vision and Pattern Recognition · Computer Science 2025-04-09 Dahyun Kang , Ahmet Iscen , Eunchan Jo , Sua Choi , Minsu Cho , Cordelia Schmid

We address claim normalization for multilingual misinformation detection - transforming noisy social media posts into clear, verifiable statements across 20 languages. The key contribution demonstrates how systematic decomposition of posts…

The normalizing layer has become one of the basic configurations of deep learning models, but it still suffers from computational inefficiency, interpretability difficulties, and low generality. After gaining a deeper understanding of the…

Machine Learning · Computer Science 2022-10-14 Chang Liu , Yuwen Yang , Yue Ding , Hongtao Lu

A large fraction of textual data available today contains various types of 'noise', such as OCR noise in digitized documents, noise due to informal writing style of users on microblogging sites, and so on. To enable tasks such as…

Information Retrieval · Computer Science 2021-01-12 Anurag Roy , Shalmoli Ghosh , Kripabandhu Ghosh , Saptarshi Ghosh

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

Data noising is an effective technique for regularizing neural network models. While noising is widely adopted in application domains such as vision and speech, commonly used noising primitives have not been developed for discrete…

Machine Learning · Computer Science 2017-03-09 Ziang Xie , Sida I. Wang , Jiwei Li , Daniel Lévy , Aiming Nie , Dan Jurafsky , Andrew Y. Ng

Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such…

Computation and Language · Computer Science 2022-01-03 Huihan Yao , Ying Chen , Qinyuan Ye , Xisen Jin , Xiang Ren

Text normalization is an essential task in the processing and analysis of social media that is dominated with informal writing. It aims to map informal words to their intended standard forms. Previously proposed text normalization…

Computation and Language · Computer Science 2017-12-29 Salman Ahmad Ansari , Usman Zafar , Asim Karim

Most biological systems are formed by component parts that to some degree are inter-related. Groups of parts that are more associated among themselves and are relatively autonomous from others are called modules. One of the consequences of…

Populations and Evolution · Quantitative Biology 2013-08-12 Gabriel Marroig , Diogo Melo , Guilherme Garcia

Recurrent neural networks (RNNs) are powerful models of sequential data. They have been successfully used in domains such as text and speech. However, RNNs are susceptible to overfitting; regularization is important. In this paper we…

Machine Learning · Statistics 2018-07-16 Adji B. Dieng , Rajesh Ranganath , Jaan Altosaar , David M. Blei

Social media networks and chatting platforms often use an informal version of natural text. Adversarial spelling attacks also tend to alter the input text by modifying the characters in the text. Normalizing these texts is an essential step…

Computation and Language · Computer Science 2020-06-26 Fenil Doshi , Jimit Gandhi , Deep Gosalia , Sudhir Bagul

Real-noise denoising is a challenging task because the statistics of real-noise do not follow the normal distribution, and they are also spatially and temporally changing. In order to cope with various and complex real-noise, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Yoonsik Kim , Jae Woong Soh , Gu Yong Park , Nam Ik Cho

User-generated content published on microblogging social networks constitutes a priceless source of information. However, microtexts usually deviate from the standard lexical and grammatical rules of the language, thus making its processing…

Computation and Language · Computer Science 2024-02-06 Yerai Doval , Manuel Vilares , Jesús Vilares

Current benchmark tasks for natural language processing contain text that is qualitatively different from the text used in informal day to day digital communication. This discrepancy has led to severe performance degradation of…

Computation and Language · Computer Science 2021-10-13 Ana-Maria Bucur , Adrian Cosma , Liviu P. Dinu

The density estimation is one of the core problems in statistics. Despite this, existing techniques like maximum likelihood estimation are computationally inefficient due to the intractability of the normalizing constant. For this reason an…

Machine Learning · Computer Science 2021-01-14 Tsimboy Olga , Yermek Kapushev , Evgeny Burnaev , Ivan Oseledets

This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. We present a data set of general text where the…

Computation and Language · Computer Science 2017-01-26 Richard Sproat , Navdeep Jaitly

Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This…

Computation and Language · Computer Science 2015-07-16 D. S. Pavan Kumar

Natural language is compositional; the meaning of a sentence is a function of the meaning of its parts. This property allows humans to create and interpret novel sentences, generalizing robustly outside their prior experience. Neural…

Computation and Language · Computer Science 2021-06-30 Henry Conklin , Bailin Wang , Kenny Smith , Ivan Titov
‹ Prev 1 2 3 10 Next ›