English
Related papers

Related papers: Reparameterization through Spatial Gradient Scalin…

200 papers

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning…

Machine Learning · Computer Science 2016-06-07 Tim Salimans , Diederik P. Kingma

The computational overhead of Vision Transformers in practice stems fundamentally from their deep architectures, yet existing acceleration strategies have primarily targeted algorithmic-level optimizations such as token pruning and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Chengwei Zhou , Vipin Chaudhary , Gourav Datta

We present a new algorithm for stochastic variational inference that targets at models with non-differentiable densities. One of the key challenges in stochastic variational inference is to come up with a low-variance estimator of the…

Machine Learning · Computer Science 2018-10-26 Wonyeol Lee , Hangyeol Yu , Hongseok Yang

Multi-task networks are commonly utilized to alleviate the need for a large number of highly specialized single-task networks. However, two common challenges in developing multi-task models are often overlooked in literature. First,…

Computer Vision and Pattern Recognition · Computer Science 2020-07-27 Menelaos Kanakis , David Bruggemann , Suman Saha , Stamatios Georgoulis , Anton Obukhov , Luc Van Gool

Low-variance gradient estimation is crucial for learning directed graphical models parameterized by neural networks, where the reparameterization trick is widely used for those with continuous variables. While this technique gives…

Machine Learning · Statistics 2016-11-07 Seiya Tokui , Issei sato

Recent results suggest that reinitializing a subset of the parameters of a neural network during training can improve generalization, particularly for small training sets. We study the impact of different reinitialization methods in several…

Machine Learning · Computer Science 2021-09-02 Ibrahim Alabdulmohsin , Hartmut Maennel , Daniel Keysers

We introduce a novel stochastic regularization technique for deep neural networks, which decomposes a layer into multiple branches with different parameters and merges stochastically sampled combinations of the outputs from the branches…

Machine Learning · Computer Science 2019-10-04 Wonpyo Park , Paul Hongsuck Seo , Bohyung Han , Minsu Cho

Gradient dynamics play a central role in determining the stability and generalization of deep neural networks. In this work, we provide an empirical analysis of how variance and standard deviation of gradients evolve during training,…

Machine Learning · Computer Science 2025-09-09 Vincent-Daniel Yun

Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is…

Machine Learning · Computer Science 2020-06-24 Ehsan Amid , Manfred K. Warmuth

We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our…

Machine Learning · Statistics 2023-01-31 Jiangyuan Li , Thanh V. Nguyen , Chinmay Hegde , Raymond K. W. Wong

In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the…

Computer Vision and Pattern Recognition · Computer Science 2018-12-13 Xialei Liu , Marc Masana , Luis Herranz , Joost Van de Weijer , Antonio M. Lopez , Andrew D. Bagdanov

Modern deep neural networks are typically highly overparameterized. Pruning techniques are able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic reallocation of…

Machine Learning · Computer Science 2019-05-14 Hesham Mostafa , Xin Wang

The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the…

Machine Learning · Statistics 2021-10-07 Jonathan Schwarz , Siddhant M. Jayakumar , Razvan Pascanu , Peter E. Latham , Yee Whye Teh

Recent breakthroughs in computer vision make use of large deep neural networks, utilizing the substantial speedup offered by GPUs. For applications running on limited hardware, however, high precision real-time processing can still be a…

Machine Learning · Computer Science 2018-02-05 Oran Shayer , Dan Levi , Ethan Fetaya

Large kernel convolutions offer a scalable alternative to vision transformers for high-resolution 3D volumetric analysis, yet naively increasing kernel size often leads to optimization instability. Motivated by the spatial bias inherent in…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Ho Hin Lee , Quan Liu , Shunxing Bao , Yuankai Huo , Bennett A. Landman

In computational neuroscience, fixed points of recurrent neural networks are commonly used to model neural responses to static or slowly changing stimuli. These applications raise the question of how to train the weights in a recurrent…

Neurons and Cognition · Quantitative Biology 2023-07-28 Vicky Zhu , Robert Rosenbaum

This paper proposes a novel, efficient transfer learning method, called Scalable Weight Reparametrization (SWR) that is efficient and effective for multiple downstream tasks. Efficient transfer learning involves utilizing a pre-trained…

Machine Learning · Computer Science 2023-02-28 Byeonggeun Kim , Jun-Tae Lee , Seunghan yang , Simyung Chang

Sequential learning paradigms pose challenges for gradient-based deep learning due to difficulties incorporating new data and retaining prior knowledge. While Gaussian processes elegantly tackle these problems, they struggle with…

Machine Learning · Statistics 2024-03-19 Aidan Scannell , Riccardo Mereu , Paul Chang , Ella Tamir , Joni Pajarinen , Arno Solin

As a structured prediction task, scene graph generation, given an input image, aims to explicitly model objects and their relationships by constructing a visually-grounded scene graph. In the current literature, such task is universally…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Daqi Liu , Miroslaw Bober , Josef Kittler

Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing…

Machine Learning · Computer Science 2023-06-30 Rinor Cakaj , Jens Mehnert , Bin Yang
‹ Prev 1 2 3 10 Next ›