Related papers: Dynamic Gradient Alignment for Online Data Mixing

Online Learning-guided Learning Rate Adaptation via Gradient Alignment

The performance of an optimizer on large-scale deep learning models depends critically on fine-tuning the learning rate, often requiring an extensive grid search over base learning rates, schedules, and other hyperparameters. In this paper,…

Machine Learning · Computer Science 2025-06-11 Ruichen Jiang , Ali Kavis , Aryan Mokhtari

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

When aligning large language models (LLMs), their performance on various tasks (such as being helpful, harmless, and honest) depends heavily on the composition of their training data. However, selecting a data mixture that achieves strong…

Machine Learning · Computer Science 2025-06-03 Nicholas E. Corrado , Julian Katz-Samuels , Adithya Devraj , Hyokun Yun , Chao Zhang , Yi Xu , Yi Pan , Bing Yin , Trishul Chilimbi

DMA: Online RAG Alignment with Human Feedback

Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning framework that systematically…

Artificial Intelligence · Computer Science 2025-11-10 Yu Bai , Yukai Miao , Dawei Wang , Li Chen , Fei Long , Rundi Zhai , Dan Li , Yanyu Ren , Tianfeng Liu , Hongtao Xie , Ce Yang , Xuhui Cai

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

Large language models (LLMs) rely on pretraining on massive and heterogeneous corpora, where training data composition has a decisive impact on training efficiency and downstream generalization under realistic compute and data budget…

Computation and Language · Computer Science 2026-04-21 Zhuo Chen , Yuxuan Miao , Supryadi , Deyi Xiong

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining

Pretraining large language models (LLMs) on vast and heterogeneous datasets is crucial for achieving state-of-the-art performance across diverse downstream tasks. However, current training paradigms treat all samples equally, overlooking…

Machine Learning · Computer Science 2025-02-11 Daouda Sow , Herbert Woisetschläger , Saikiran Bulusu , Shiqiang Wang , Hans-Arno Jacobsen , Yingbin Liang

GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning

Reinforcement learning (RL) has become a central post-training paradigm for large language models (LLMs), but its performance is highly sensitive to the quality of training problems. This sensitivity stems from the non-stationarity of RL:…

Machine Learning · Computer Science 2026-02-26 Ningyuan Yang , Weihua Du , Weiwei Sun , Sean Welleck , Yiming Yang

Gradient-Guided Annealing for Domain Generalization

Domain Generalization (DG) research has gained considerable traction as of late, since the ability to generalize to unseen data distributions is a requirement that eludes even state-of-the-art training algorithms. In this paper we observe…

Machine Learning · Computer Science 2025-07-22 Aristotelis Ballas , Christos Diou

Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models

Training large language models (LLMs) typically involves pre-training on massive corpora, only to restart the process entirely when new data becomes available. A more efficient and resource-conserving approach would be continual…

Machine Learning · Computer Science 2025-08-05 Istabrak Abbes , Gopeshh Subbaraj , Matthew Riemer , Nizar Islah , Benjamin Therien , Tsuguchika Tabaru , Hiroaki Kingetsu , Sarath Chandar , Irina Rish

Dynamic Skill Adaptation for Large Language Models

We present Dynamic Skill Adaptation (DSA), an adaptive and dynamic framework to adapt novel and complex skills to Large Language Models (LLMs). Compared with previous work which learns from human-curated and static data in random orders, we…

Computation and Language · Computer Science 2024-12-30 Jiaao Chen , Diyi Yang

Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation

Although recent years have witnessed significant advancements in medical image segmentation, the pervasive issue of domain shift among medical images from diverse centres hinders the effective deployment of pre-trained models. Many…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Ziyang Chen , Yiwen Ye , Yongsheng Pan , Yong Xia

Data Selection for LLM Alignment Using Fine-Grained Preferences

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment…

Machine Learning · Computer Science 2026-03-03 Jia Zhang , Yao Liu , Chen-Xi Zhang , Yi Liu , Yi-Xuan Jin , Lan-Zhe Guo , Yu-Feng Li

Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW

Stochastic gradient-based descent (SGD), have long been central to training large language models (LLMs). However, their effectiveness is increasingly being questioned, particularly in large-scale applications where empirical evidence…

Machine Learning · Computer Science 2025-07-03 Di Zhang , Yihang Zhang

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

Training and fine-tuning large language models (LLMs) come with challenges related to memory and computational requirements due to the increasing size of the model weights and the optimizer states. Various techniques have been developed to…

Machine Learning · Computer Science 2025-12-09 Yehonathan Refael , Jonathan Svirsky , Boris Shustin , Wasim Huleihel , Ofir Lindenbaum

Efficient Online Data Mixing For Language Model Pre-Training

The data used to pretrain large language models has a decisive impact on a model's downstream performance, which has led to a large body of work on data selection methods that aim to automatically determine the most suitable data to use for…

Computation and Language · Computer Science 2023-12-12 Alon Albalak , Liangming Pan , Colin Raffel , William Yang Wang

Dynamic Batch Adaptation

Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on…

Machine Learning · Computer Science 2022-08-02 Cristian Simionescu , George Stoica , Robert Herscovici

Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects…

Computation and Language · Computer Science 2024-10-08 Fei Wang , Ninareh Mehrabi , Palash Goyal , Rahul Gupta , Kai-Wei Chang , Aram Galstyan

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

The performance of large language models (LLMs) across diverse downstream applications is fundamentally governed by the quality and composition of their pretraining corpora. Existing domain reweighting algorithms primarily optimize data…

Machine Learning · Computer Science 2025-05-28 Simin Fan , Maria Ios Glarou , Martin Jaggi

Distribution Shift Alignment Helps LLMs Simulate Survey Response Distributions

Large language models (LLMs) offer a promising way to simulate human survey responses, potentially reducing the cost of large-scale data collection. However, existing zero-shot methods suffer from prompt sensitivity and low accuracy, while…

Artificial Intelligence · Computer Science 2026-04-20 Ji Huang , Mengfei Li , Shuai Shao

Efficient Alignment of Large Language Models via Data Sampling

LLM alignment ensures that large language models behave safely and effectively by aligning their outputs with human values, goals, and intentions. Aligning LLMs employ huge amounts of data, computation, and time. Moreover, curating data…

Machine Learning · Computer Science 2025-02-19 Amrit Khera , Rajat Ghosh , Debojyoti Dutta

G-DIG: Towards Gradient-based Diverse and High-quality Instruction Data Selection for Machine Translation

Large Language Models (LLMs) have demonstrated remarkable abilities in general scenarios. Instruction finetuning empowers them to align with humans in various tasks. Nevertheless, the Diversity and Quality of the instruction data remain two…

Computation and Language · Computer Science 2024-07-09 Xingyuan Pan , Luyang Huang , Liyan Kang , Zhicheng Liu , Yu Lu , Shanbo Cheng