Related papers: Learning to Read through Machine Teaching

Approximate Distribution Matching for Sequence-to-Sequence Learning

Sequence-to-Sequence models were introduced to tackle many real-life problems like machine translation, summarization, image captioning, etc. The standard optimization algorithms are mainly based on example-to-example matching like maximum…

Computation and Language · Computer Science 2018-09-05 Wenhu Chen , Guanlin Li , Shujie Liu , Zhirui Zhang , Mu Li , Ming Zhou

Stochastic dynamics of lexicon learning in an uncertain and nonuniform world

We study the time taken by a language learner to correctly identify the meaning of all words in a lexicon under conditions where many plausible meanings can be inferred whenever a word is uttered. We show that the most basic form of…

Physics and Society · Physics 2015-05-26 Rainer Reisenauer , Kenny Smith , Richard A. Blythe

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the…

Machine Learning · Computer Science 2019-03-26 Giulia Denevi , Carlo Ciliberto , Riccardo Grazzi , Massimiliano Pontil

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the…

Computation and Language · Computer Science 2018-03-01 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Optimization Methods for Large-Scale Machine Learning

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…

Machine Learning · Statistics 2018-02-12 Léon Bottou , Frank E. Curtis , Jorge Nocedal

Sequence-to-Sequence Learning with Latent Neural Grammars

Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks. This approach typically models the local distribution over the next word with a powerful neural network that can condition on…

Computation and Language · Computer Science 2021-11-17 Yoon Kim

Sentence Level Curriculum Learning for Improved Neural Conversational Models

Designing machine intelligence to converse with a human user necessarily requires an understanding of how humans participate in conversation, and thus conversation modeling is an important task in natural language processing. New…

Computation and Language · Computer Science 2023-05-16 Sean Paulsen

Advances in Optimizing Recurrent Networks

After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in…

Machine Learning · Computer Science 2012-12-17 Yoshua Bengio , Nicolas Boulanger-Lewandowski , Razvan Pascanu

Order matters: Distributional properties of speech to young children bootstraps learning of semantic representations

Some researchers claim that language acquisition is critically dependent on experiencing linguistic input in order of increasing complexity. We set out to test this hypothesis using a simple recurrent neural network (SRN) trained to predict…

Computation and Language · Computer Science 2018-02-05 Philip A Huebner , Jon A Willits

Learning Longer Memory in Recurrent Neural Networks

Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due…

Neural and Evolutionary Computing · Computer Science 2015-04-20 Tomas Mikolov , Armand Joulin , Sumit Chopra , Michael Mathieu , Marc'Aurelio Ranzato

A Distributional Perspective on Word Learning in Neural Language Models

Language models (LMs) are increasingly being studied as models of human language learners. Due to the nascency of the field, it is not well-established whether LMs exhibit similar learning dynamics to humans, and there are few direct…

Computation and Language · Computer Science 2025-02-11 Filippo Ficarra , Ryan Cotterell , Alex Warstadt

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We…

Computation and Language · Computer Science 2018-11-05 Xuan Zhang , Gaurav Kumar , Huda Khayrallah , Kenton Murray , Jeremy Gwinnup , Marianna J Martindale , Paul McNamee , Kevin Duh , Marine Carpuat

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic…

Computation and Language · Computer Science 2017-04-25 Zhe Gan , Chunyuan Li , Changyou Chen , Yunchen Pu , Qinliang Su , Lawrence Carin

Curriculum optimization for low-resource speech recognition

Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text. However, conventional data feeding pipelines may be sub-optimal for low-resource speech recognition, which still remains a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-21 Anastasia Kuznetsova , Anurag Kumar , Jennifer Drexler Fox , Francis Tyers

A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition

An important problem in machine auditory perception is to recognize and detect sound events. In this paper, we propose a sequential self-teaching approach to learning sounds. Our main proposition is that it is harder to learn sounds in…

Sound · Computer Science 2020-07-02 Anurag Kumar , Vamsi Krishna Ithapu

Successes and critical failures of neural networks in capturing human-like speech recognition

Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a…

Sound · Computer Science 2023-04-20 Federico Adolfi , Jeffrey S. Bowers , David Poeppel

Training variance and performance evaluation of neural networks in speech

In this work we study variance in the results of neural network training on a wide variety of configurations in automatic speech recognition. Although this variance itself is well known, this is, to the best of our knowledge, the first…

Machine Learning · Computer Science 2016-06-15 Ewout van den Berg , Bhuvana Ramabhadran , Michael Picheny

Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

It is common knowledge that the quantity and quality of the training data play a significant role in the creation of a good machine learning model. In this paper, we take it one step further and demonstrate that the way the training…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-12 Georgios Karakasidis , Tamás Grósz , Mikko Kurimo

Long Short-Term Memory-Networks for Machine Reading

In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left to right and performs shallow reasoning…

Computation and Language · Computer Science 2016-09-22 Jianpeng Cheng , Li Dong , Mirella Lapata

Learning Optimal Classification Trees Robust to Distribution Shifts

We consider the problem of learning classification trees that are robust to distribution shifts between training and testing/deployment data. This problem arises frequently in high stakes settings such as public health and social work where…

Machine Learning · Computer Science 2025-08-27 Nathan Justin , Sina Aghaei , Andrés Gómez , Phebe Vayanos