English
Related papers

Related papers: Alignment-Aware Model Adaptation via Feedback-Guid…

200 papers

This paper presents a gradient-informed fine-tuning method for large language models under few-shot conditions. The goal is to enhance task adaptability and training stability when data is limited. The method builds on a base loss function…

Computation and Language · Computer Science 2025-06-03 Hongye Zheng , Yichen Wang , Ray Pan , Guiran Liu , Binrong Zhu , Hanlu Zhang

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences.…

Machine Learning · Computer Science 2023-12-04 Hanze Dong , Wei Xiong , Deepanshu Goyal , Yihan Zhang , Winnie Chow , Rui Pan , Shizhe Diao , Jipeng Zhang , Kashun Shum , Tong Zhang

Feedback Alignment (FA) methods are biologically inspired local learning rules for training neural networks with reduced communication between layers. While FA has potential applications in distributed and privacy-aware ML, limitations in…

Machine Learning · Computer Science 2024-06-05 Zachary Robertson , Oluwasanmi Koyejo

Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers have no adversarial intent. We show that the prevailing explanation, that…

Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable to adversarial misuse. While prior work has shown that safety-relevant features are encoded…

Machine Learning · Computer Science 2026-05-05 Sadia Asif , Mohammad Mohammadi Amiri

Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a…

Computation and Language · Computer Science 2026-05-12 Jyotin Goel , Souvik Maji , Pratik Mazumder

Reward-model-based fine-tuning is a central paradigm in aligning Large Language Models with human preferences. However, such approaches critically rely on the assumption that proxy reward models accurately reflect intended supervision, a…

Computation and Language · Computer Science 2026-01-21 Zixuan Liu , Siavash H. Khajavi , Guangkai Jiang , Xinru Liu

Safety alignment is a key requirement for building reliable Artificial General Intelligence. Despite significant advances in safety alignment, we observe that minor latent shifts can still trigger unsafe responses in aligned models. We…

Machine Learning · Computer Science 2025-06-23 Tianle Gu , Kexin Huang , Zongqi Wang , Yixu Wang , Jie Li , Yuanqi Yao , Yang Yao , Yujiu Yang , Yan Teng , Yingchun Wang

Alignment of large language models remains a central challenge in natural language processing. Preference optimization has emerged as a popular and effective method for improving alignment, typically through training-time or prompt-based…

Machine Learning · Computer Science 2025-10-01 Frédéric Berdoz , Luca A. Lanzendörfer , René Caky , Roger Wattenhofer

Feedback alignment algorithms are an alternative to backpropagation to train neural networks, whereby some of the partial derivatives that are required to compute the gradient are replaced by random terms. This essentially transforms the…

Machine Learning · Computer Science 2023-06-06 Dominique Chu , Florian Bacho

The emergence of foundational models has greatly improved performance across various downstream tasks, with fine-tuning often yielding even better results. However, existing fine-tuning approaches typically require access to model weights…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Matan Levy , Rami Ben-Ari , Dvir Samuel , Nir Darshan , Dani Lischinski

Fine-tuning Large Language Models (LLMs) for downstream tasks often compromises safety alignment, even when using parameter-efficient methods like LoRA. In this work, we uncover a notable property: fine-tuned models preserve the geometric…

Machine Learning · Computer Science 2025-11-25 Thong Bach , Thanh Nguyen-Tang , Dung Nguyen , Thao Minh Le , Truyen Tran

Asynchronous execution is essential for scaling reinforcement learning (RL) to modern large model workloads, including large language models and AI agents, but it can fundamentally alter RL optimization behavior. While prior work on…

Machine Learning · Computer Science 2026-03-03 Haofeng Xu , Junwei Su , Yukun Tian , Lansong Diao , Zhengping Qian , Chuan Wu

This paper presents a novel optimization method for maximizing generalization over tasks in meta-learning. The goal of meta-learning is to learn a model for an agent adapting rapidly when presented with previously unseen tasks. Tasks are…

Machine Learning · Computer Science 2018-10-19 Amir Erfan Eshratifar , David Eigen , Massoud Pedram

Understanding the vulnerability of large-scale pre-trained vision-language models like CLIP against adversarial attacks is key to ensuring zero-shot generalization capacity on various downstream tasks. State-of-the-art defense mechanisms…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Fan Yang , Mingxuan Xia , Sangzhou Xia , Chicheng Ma , Hui Hui

In real-world vision systems,haze removal is required not only to enhance image visibility but also to meet the specific needs of diverse downstream tasks.To address this challenge,we propose a novel adaptive dynamic dehazing framework that…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Yafei Zhang , Shuaitian Song , Huafeng Li , Shujuan Wang , Yu Liu

Deep generative models provide state-of-the-art performance across a wide array of applications, with recent studies showing increasing applicability for science and engineering. Despite a growing corpus of literature focused on the…

Machine Learning · Computer Science 2026-05-14 Jacob K. Christopher , James E. Warner , Ferdinando Fioretto

In this work, we introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models. Neural Networks suffer from abrupt loss in performance when retrained with additional…

Machine Learning · Computer Science 2023-12-22 Kamil Deja , Bartosz Cywiński , Jan Rybarczyk , Tomasz Trzciński

Parameter-Efficient Fine-Tuning (PEFT) effectively adapts pre-trained transformers to downstream tasks. However, the optimization of tasks performance often comes at the cost of generalizability in fine-tuned models. To address this issue,…

Machine Learning · Computer Science 2026-03-09 Yao Ni , Shan Zhang , Piotr Koniusz

A promising paradigm for adapting instruction-tuned language models is to learn task-specific updates on a pretrained base model and subsequently merge them into the instruction-tuned model. However, existing approaches typically treat the…

Computation and Language · Computer Science 2026-05-05 Zhiwen Ruan , Yichao Du , Jianjie Zheng , Longyue Wang , Yun Chen , Peng Li , Jinsong Su , Yang Liu , Guanhua Chen
‹ Prev 1 2 3 10 Next ›