Related papers: Sequential Reptile: Inter-Task Gradient Alignment …

Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment

Multilingual generative models obtain remarkable cross-lingual in-context learning capabilities through pre-training on large-scale corpora. However, they still exhibit a performance bias toward high-resource languages and learn isolated…

Computation and Language · Computer Science 2024-06-13 Chong Li , Shaonan Wang , Jiajun Zhang , Chengqing Zong

Multitask Learning with Single Gradient Step Update for Task Balancing

Multitask learning is a methodology to boost generalization performance and also reduce computational intensity and memory usage. However, learning multiple tasks simultaneously can be more difficult than learning a single task because it…

Machine Learning · Computer Science 2020-06-03 Sungjae Lee , Youngdoo Son

Gradient Agreement as an Optimization Objective for Meta-Learning

This paper presents a novel optimization method for maximizing generalization over tasks in meta-learning. The goal of meta-learning is to learn a model for an agent adapting rapidly when presented with previously unseen tasks. Tasks are…

Machine Learning · Computer Science 2018-10-19 Amir Erfan Eshratifar , David Eigen , Massoud Pedram

Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs

While large language models demonstrate remarkable capabilities at task-specific applications through fine-tuning, extending these benefits across diverse languages is essential for broad accessibility. However, effective cross-lingual…

Computation and Language · Computer Science 2025-06-03 Danni Liu , Jan Niehues

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs

Multi-task post-training of large language models (LLMs) is typically performed by mixing datasets from different tasks and optimizing them jointly. This approach implicitly assumes that all tasks contribute gradients of similar magnitudes;…

Machine Learning · Computer Science 2025-10-28 Runzhe Wu , Ankur Samanta , Ayush Jain , Scott Fujimoto , Jeongyeol Kwon , Ben Kretzu , Youliang Yu , Kaveh Hassani , Boris Vidolov , Yonathan Efroni

Layerwise Optimization by Gradient Decomposition for Continual Learning

Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. However, when learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Shixiang Tang , Dapeng Chen , Jinguo Zhu , Shijie Yu , Wanli Ouyang

Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance

Multimodal continual instruction tuning enables multimodal large language models to sequentially adapt to new tasks while building upon previously acquired knowledge. However, this continual learning paradigm faces the significant challenge…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Songze Li , Mingyu Gao , Tonghua Su , Xu-Yao Zhang , Zhongjie Wang

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System

In industrial recommendation systems, multi-task learning (learning multiple tasks simultaneously on a single model) is a predominant approach to save training/serving resources and improve recommendation performance via knowledge transfer…

Information Retrieval · Computer Science 2024-11-20 Yun He , Xuxing Chen , Jiayi Xu , Renqin Cai , Yiling You , Jennifer Cao , Minhui Huang , Liu Yang , Yiqun Liu , Xiaoyi Liu , Rong Jin , Sem Park , Bo Long , Xue Feng

Semi-supervised Multi-task Learning for Semantics and Depth

Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance. Typical MTL methods are jointly trained with the complete multitude of ground-truths for all tasks…

Computer Vision and Pattern Recognition · Computer Science 2021-10-15 Yufeng Wang , Yi-Hsuan Tsai , Wei-Chih Hung , Wenrui Ding , Shuo Liu , Ming-Hsuan Yang

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and…

Computation and Language · Computer Science 2020-10-06 Zihan Liu , Genta Indra Winata , Andrea Madotto , Pascale Fung

Gradient Coordination for Quantifying and Maximizing Knowledge Transference in Multi-Task Learning

Multi-task learning (MTL) has been widely applied in online advertising and recommender systems. To address the negative transfer issue, recent studies have proposed optimization methods that thoroughly focus on the gradient alignment of…

Information Retrieval · Computer Science 2023-03-13 Xuanhua Yang , Jianxin Zhao , Shaoguo Liu , Liang Wang , Bo Zheng

Leveraging convergence behavior to balance conflicting tasks in multi-task learning

Multi-Task Learning is a learning paradigm that uses correlated tasks to improve performance generalization. A common way to learn multiple tasks is through the hard parameter sharing approach, in which a single architecture is used to…

Machine Learning · Computer Science 2022-04-15 Angelica Tiemi Mizuno Nakamura , Denis Fernando Wolf , Valdir Grassi

Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study

Multilingual BERT (mBERT) has shown reasonable capability for zero-shot cross-lingual transfer when fine-tuned on downstream tasks. Since mBERT is not pre-trained with explicit cross-lingual supervision, transfer performance can further be…

Computation and Language · Computer Science 2020-10-01 Saurabh Kulshreshtha , José Luis Redondo-García , Ching-Yun Chang

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power…

Computation and Language · Computer Science 2020-09-22 Zhaojiang Lin , Andrea Madotto , Pascale Fung

Asynchronous Multi-Task Learning

Many real-world machine learning applications involve several learning tasks which are inter-related. For example, in healthcare domain, we need to learn a predictive model of a certain disease for many hospitals. The models for each…

Machine Learning · Computer Science 2016-10-03 Inci M. Baytas , Ming Yan , Anil K. Jain , Jiayu Zhou

Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers

Without any explicit cross-lingual training data, multilingual language models can achieve cross-lingual transfer. One common way to improve this transfer is to perform realignment steps before fine-tuning, i.e., to train the model to build…

Computation and Language · Computer Science 2023-06-06 Félix Gaschi , Patricio Cerda , Parisa Rastin , Yannick Toussaint

Meta-learning the Learning Trends Shared Across Tasks

Meta-learning stands for 'learning to learn' such that generalization to new tasks is achieved. Among these methods, Gradient-based meta-learning algorithms are a specific sub-class that excel at quick adaptation to new tasks with limited…

Machine Learning · Computer Science 2020-10-20 Jathushan Rajasegaran , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Mubarak Shah

Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation

Multi-task learning (MTL) aims to improve the generalization of several related tasks by learning them jointly. As a comparison, in addition to the joint training scheme, modern meta-learning allows unseen tasks with limited labels during…

Machine Learning · Computer Science 2021-06-17 Haoxiang Wang , Han Zhao , Bo Li

PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment

Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining. However, the spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory…

Computation and Language · Computer Science 2024-11-19 Jiahuan Li , Shujian Huang , Aarron Ching , Xinyu Dai , Jiajun Chen

Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models

Training large language models (LLMs) typically involves pre-training on massive corpora, only to restart the process entirely when new data becomes available. A more efficient and resource-conserving approach would be continual…

Machine Learning · Computer Science 2025-08-05 Istabrak Abbes , Gopeshh Subbaraj , Matthew Riemer , Nizar Islah , Benjamin Therien , Tsuguchika Tabaru , Hiroaki Kingetsu , Sarath Chandar , Irina Rish