Related papers: A model for learning to segment temporal sequences…

Gradual Learning of Recurrent Neural Networks

Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence modeling tasks. However, RNNs are difficult to train and tend to suffer from overfitting. Motivated by the Data Processing Inequality (DPI), we…

Machine Learning · Statistics 2018-05-24 Ziv Aharoni , Gal Rattner , Haim Permuter

Adaptive Recurrent Neural Network Based on Mixture Layer

Although Recurrent Neural Network (RNN) has been a powerful tool for modeling sequential data, its performance is inadequate when processing sequences with multiple patterns. In this paper, we address this challenge by introducing a novel…

Machine Learning · Computer Science 2019-02-28 Kui Zhao , Yuechuan Li , Chi Zhang , Cheng Yang , Huan Xu

Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach

This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy. The parameters of the model are learned via a two-stage framework. By…

Machine Learning · Computer Science 2024-11-14 Renzi Wang , Flavia Sofia Acerbo , Tong Duy Son , Panagiotis Patrinos

On Markov Chain Gradient Descent

Stochastic gradient methods are the workhorse (algorithms) of large-scale optimization problems in machine learning, signal processing, and other computational sciences and engineering. This paper studies Markov chain gradient descent, a…

Optimization and Control · Mathematics 2018-09-13 Tao Sun , Yuejiao Sun , Wotao Yin

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic…

Computation and Language · Computer Science 2017-04-25 Zhe Gan , Chunyuan Li , Changyou Chen , Yunchen Pu , Qinliang Su , Lawrence Carin

An Adaptive Stochastic Nesterov Accelerated Quasi Newton Method for Training RNNs

A common problem in training neural networks is the vanishing and/or exploding gradient problem which is more prominently seen in training of Recurrent Neural Networks (RNNs). Thus several algorithms have been proposed for training RNNs.…

Machine Learning · Computer Science 2019-09-10 S. Indrapriyadarsini , Shahrzad Mahboubi , Hiroshi Ninomiya , Hideki Asai

Mixture of ELM based experts with trainable gating network

Mixture of experts method is a neural network based ensemble learning that has great ability to improve the overall classification accuracy. This method is based on the divide and conquer principle, in which the problem space is divided…

Machine Learning · Computer Science 2021-05-26 Laleh Armi , Elham Abbasi , Jamal Zarepour-Ahmadabadi

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level…

Computer Vision and Pattern Recognition · Computer Science 2016-04-12 Ashesh Jain , Amir R. Zamir , Silvio Savarese , Ashutosh Saxena

A Probabilistic Model for Skill Acquisition with Switching Latent Feedback Controllers

Manipulation tasks often consist of subtasks, each representing a distinct skill. Mastering these skills is essential for robots, as it enhances their autonomy, efficiency, adaptability, and ability to work in their environment. Learning…

Robotics · Computer Science 2025-05-21 Juyan Zhang , Dana Kulic , Michael Burke

Learning Point Processes using Recurrent Graph Network

We present a novel Recurrent Graph Network (RGN) approach for predicting discrete marked event sequences by learning the underlying complex stochastic process. Using the framework of Point Processes, we interpret a marked discrete event…

Machine Learning · Computer Science 2022-08-12 Saurabh Dash , Xueyuan She , Saibal Mukhopadhyay

Learning Longer Memory in Recurrent Neural Networks

Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due…

Neural and Evolutionary Computing · Computer Science 2015-04-20 Tomas Mikolov , Armand Joulin , Sumit Chopra , Michael Mathieu , Marc'Aurelio Ranzato

An Ensemble of Knowledge Sharing Models for Dynamic Hand Gesture Recognition

The focus of this paper is dynamic gesture recognition in the context of the interaction between humans and machines. We propose a model consisting of two sub-networks, a transformer and an ordered-neuron long-short-term-memory (ON-LSTM)…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Kenneth Lai , Svetlana Yanushkevich

Dynamic Mixture-of-Experts for Incremental Graph Learning

Graph incremental learning is a learning paradigm that aims to adapt trained models to continuously incremented graphs and data over time without the need for retraining on the full dataset. However, regular graph machine learning methods…

Machine Learning · Computer Science 2025-08-14 Lecheng Kong , Theodore Vasiloudis , Seongjun Yun , Han Xie , Xiang Song

Mixed Membership Recurrent Neural Networks

Models for sequential data such as the recurrent neural network (RNN) often implicitly model a sequence as having a fixed time interval between observations and do not account for group-level effects when multiple sequences are observed. We…

Machine Learning · Computer Science 2018-12-27 Ghazal Fazelnia , Mark Ibrahim , Ceena Modarres , Kevin Wu , John Paisley

Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods

In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We…

Computation · Statistics 2026-03-23 David Rügamer , Florian Pfisterer , Bernd Bischl , Bettina Grün

Mixture of Raytraced Experts

We introduce a Mixture of Raytraced Experts, a stacked Mixture of Experts (MoE) architecture which can dynamically select sequences of experts, producing computational graphs of variable width and depth. Existing MoE architectures generally…

Machine Learning · Computer Science 2025-07-17 Andrea Perin , Giacomo Lagomarsini , Claudio Gallicchio , Giuseppe Nuti

supervised adptive threshold network for instance segmentation

Currently, instance segmentation is attracting more and more attention in machine learning region. However, there exists some defects on the information propagation in previous Mask R-CNN and other network models. In this paper, we propose…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Kuikun Liu , Jie Yang , Cai Sun , Haoyuan Chi

ULTRA-MC: A Unified Approach to Learning Mixtures of Markov Chains via Hitting Times

This study introduces a novel approach for learning mixtures of Markov chains, a critical process applicable to various fields, including healthcare and the analysis of web users. Existing research has identified a clear divide in…

Machine Learning · Computer Science 2024-05-27 Fabian Spaeh , Konstantinos Sotiropoulos , Charalampos E. Tsourakakis

Expeditious Saliency-guided Mix-up through Random Gradient Thresholding

Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. Over the years, the research community expands mix-up methods into two directions, with extensive efforts to improve…

Computer Vision and Pattern Recognition · Computer Science 2023-08-14 Minh-Long Luu , Zeyi Huang , Eric P. Xing , Yong Jae Lee , Haohan Wang

The switch Markov chain for sampling irregular graphs

The problem of efficiently sampling from a set of(undirected) graphs with a given degree sequence has many applications. One approach to this problem uses a simple Markov chain, which we call the switch chain, to perform the sampling. The…

Data Structures and Algorithms · Computer Science 2014-12-18 Catherine Greenhill