Related papers: Recurrent Diffusion for Large-Scale Parameter Gene…

ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion

Parameter generation has emerged as a novel paradigm for neural network development, offering an alternative to traditional neural network training by synthesizing high-quality model weights directly. In the context of Low-Rank Adaptation…

Machine Learning · Computer Science 2025-04-10 Rana Muhammad Shahroz Khan , Dongwen Tang , Pingzhi Li , Kai Wang , Tianlong Chen

Compact and Optimal Deep Learning with Recurrent Parameter Generators

Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the…

Computer Vision and Pattern Recognition · Computer Science 2022-10-28 Jiayun Wang , Yubei Chen , Stella X. Yu , Brian Cheung , Yann LeCun

NeuroGen: Neural Network Parameter Generation via Large Language Models

Acquiring the parameters of neural networks (NNs) has been one of the most important problems in machine learning since the inception of NNs. Traditional approaches, such as backpropagation and forward-only optimization, acquire parameters…

Artificial Intelligence · Computer Science 2025-05-26 Jiaqi Wang , Yusen Zhang , Xi Li

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Zhaochen Yu , Chenlin Meng , Minkai Xu , Stefano Ermon , Bin Cui

Neural Network Diffusion

Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an…

Machine Learning · Computer Science 2025-01-03 Kai Wang , Dongwen Tang , Boya Zeng , Yida Yin , Zhaopan Xu , Yukun Zhou , Zelin Zang , Trevor Darrell , Zhuang Liu , Yang You

RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices.…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Avideep Mukherjee , Soumya Banerjee , Piyush Rai , Vinay P. Namboodiri

Instruction-Guided Autoregressive Neural Network Parameter Generation

Learning to generate neural network parameters conditioned on task descriptions and architecture specifications is pivotal for advancing model adaptability and transfer learning. Existing methods especially those based on diffusion models…

Machine Learning · Computer Science 2025-04-04 Soro Bedionita , Bruno Andreis , Song Chong , Sung Ju Hwang

RDPM: Solve Diffusion Probabilistic Models via Recurrent Token Prediction

Diffusion Probabilistic Models (DPMs) have emerged as the de facto approach for high-fidelity image synthesis, operating diffusion processes on continuous VAE latent, which significantly differ from the text generation methods employed by…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Xiaoping Wu , Jie Hu , Xiaoming Wei

Neural Graph Generator: Feature-Conditioned Graph Generation using Latent Diffusion Models

Graph generation has emerged as a crucial task in machine learning, with significant challenges in generating graphs that accurately reflect specific properties. Existing methods often fall short in efficiently addressing this need as they…

Machine Learning · Computer Science 2024-09-19 Iakovos Evdaimon , Giannis Nikolentzos , Christos Xypolopoulos , Ahmed Kammoun , Michail Chatzianastasis , Hadi Abdine , Michalis Vazirgiannis

Reimagining Parameter Space Exploration with Diffusion Models

Adapting neural networks to new tasks typically requires task-specific fine-tuning, which is time-consuming and reliant on labeled data. We explore a generative alternative that produces task-specific parameters directly from task identity,…

Machine Learning · Computer Science 2025-06-24 Lijun Zhang , Xiao Liu , Hui Guan

Optimizing Distributed Training on Frontier for Large Language Models

Large language models (LLMs) have demonstrated remarkable success as foundational models, benefiting various downstream applications through fine-tuning. Recent studies on loss scaling have demonstrated the superior performance of larger…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-25 Sajal Dash , Isaac Lyngaas , Junqi Yin , Xiao Wang , Romain Egele , Guojing Cong , Feiyi Wang , Prasanna Balaprakash

Recurrent Graph Tensor Networks: A Low-Complexity Framework for Modelling High-Dimensional Multi-Way Sequence

Recurrent Neural Networks (RNNs) are among the most successful machine learning models for sequence modelling, but tend to suffer from an exponential increase in the number of parameters when dealing with large multidimensional data. To…

Machine Learning · Computer Science 2021-05-12 Yao Lei Xu , Danilo P. Mandic

ReLoRA: High-Rank Training Through Low-Rank Updates

Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially. In…

Computation and Language · Computer Science 2023-12-12 Vladislav Lialin , Namrata Shivagunde , Sherin Muckatira , Anna Rumshisky

Conditional LoRA Parameter Generation

Generative models have achieved remarkable success in image, video, and text domains. Inspired by this, researchers have explored utilizing generative models to generate neural network parameters. However, these efforts have been limited by…

Artificial Intelligence · Computer Science 2024-08-05 Xiaolong Jin , Kai Wang , Dongwen Tang , Wangbo Zhao , Yukun Zhou , Junshu Tang , Yang You

Scaling Recurrent Neural Network Language Models

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set…

Computation and Language · Computer Science 2015-02-03 Will Williams , Niranjani Prasad , David Mrva , Tom Ash , Tony Robinson

Parametric Retrieval Augmented Generation

Retrieval-augmented generation (RAG) techniques have emerged as a promising solution to enhance the reliability of large language models (LLMs) by addressing issues like hallucinations, outdated knowledge, and domain adaptation. In…

Computation and Language · Computer Science 2025-01-28 Weihang Su , Yichen Tang , Qingyao Ai , Junxi Yan , Changyue Wang , Hongning Wang , Ziyi Ye , Yujia Zhou , Yiqun Liu

RDM: Recurrent Diffusion Model for Human Motion Generation

Human motion generation is a challenging task due to its high dimensionality and the difficulty of generating fine-grained motions. Diffusion methods have been proposed due to their high sample quality and expressiveness. Early approaches…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Mirgahney Mohamed , Harry Jake Cunningham , Marc P. Deisenroth , Lourdes Agapito

Decentralized Diffusion Models

Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 David McAllister , Matthew Tancik , Jiaming Song , Angjoo Kanazawa

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

Large transformer models display promising performance on a wide range of natural language processing (NLP) tasks. Although the AI community has expanded the model scale to the trillion parameter level, the practical deployment of 10-100…

Machine Learning · Computer Science 2022-09-07 Jiangsu Du , Ziming Liu , Jiarui Fang , Shenggui Li , Yongbin Li , Yutong Lu , Yang You

Learning Implicitly Recurrent CNNs Through Parameter Sharing

We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. Restricting the number of templates…

Machine Learning · Computer Science 2019-03-15 Pedro Savarese , Michael Maire