Related papers: Knowledge Distillation for Multi-task Learning

Cross-Task Knowledge Distillation in Multi-Task Recommendation

Multi-task learning (MTL) has been widely used in recommender systems, wherein predicting each type of user feedback on items (e.g, click, purchase) are treated as individual tasks and jointly trained with a unified model. Our key…

Information Retrieval · Computer Science 2022-03-29 Chenxiao Yang , Junwei Pan , Xiaofeng Gao , Tingyu Jiang , Dapeng Liu , Guihai Chen

A Survey on Multi-Task Learning

Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a…

Machine Learning · Computer Science 2021-03-30 Yu Zhang , Qiang Yang

Multi-Task Learning with Deep Neural Networks: A Survey

Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model. Such approaches offer advantages like improved data efficiency, reduced overfitting through shared…

Machine Learning · Computer Science 2020-09-22 Michael Crawshaw

Multi-Task Learning Regression via Convex Clustering

Multi-task learning (MTL) is a methodology that aims to improve the general performance of estimation and prediction by sharing common information among related tasks. In the MTL, there are several assumptions for the relationships and…

Methodology · Statistics 2023-04-27 Akira Okazaki , Shuichi Kawano

MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models

Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a…

Computation and Language · Computer Science 2020-05-04 Linqing Liu , Huan Wang , Jimmy Lin , Richard Socher , Caiming Xiong

Asynchronous Multi-Task Learning

Many real-world machine learning applications involve several learning tasks which are inter-related. For example, in healthcare domain, we need to learn a predictive model of a certain disease for many hospitals. The models for each…

Machine Learning · Computer Science 2016-10-03 Inci M. Baytas , Ming Yan , Anil K. Jain , Jiayu Zhou

Multi-Task Learning for Visual Scene Understanding

Despite the recent progress in deep learning, most approaches still go for a silo-like solution, focusing on learning each task in isolation: training a separate neural network for each individual task. Many real-world problems, however,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Simon Vandenhende

Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis

Multi-task learning (MTL) aims to build general-purpose vision systems by training a single network to perform multiple tasks jointly. While promising, its potential is often hindered by "unbalanced optimization", where task interference…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Yihang Guo , Tianyuan Yu , Liang Bai , Yanming Guo , Yirun Ruan , William Li , Weishi Zheng

LDC-MTL: Balancing Multi-Task Learning through Scalable Loss Discrepancy Control

Multi-task learning (MTL) has been widely adopted for its ability to simultaneously learn multiple tasks. While existing gradient manipulation methods often yield more balanced solutions than simple scalarization-based approaches, they…

Machine Learning · Computer Science 2025-09-29 Peiyao Xiao , Chaosheng Dong , Shaofeng Zou , Kaiyi Ji

Multi-Task Learning with Group-Specific Feature Space Sharing

When faced with learning a set of inter-related tasks from a limited amount of usable data, learning each task independently may lead to poor generalization performance. Multi-Task Learning (MTL) exploits the latent relations between tasks…

Machine Learning · Computer Science 2015-08-14 Niloofar Yousefi , Michael Georgiopoulos , Georgios C. Anagnostopoulos

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling…

Machine Learning · Computer Science 2023-03-28 Yuliang Cai , Jesse Thomason , Mohammad Rostami

Distral: Robust Multitask Reinforcement Learning

Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network…

Machine Learning · Computer Science 2017-07-14 Yee Whye Teh , Victor Bapst , Wojciech Marian Czarnecki , John Quan , James Kirkpatrick , Raia Hadsell , Nicolas Heess , Razvan Pascanu

Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation

This thesis aims to investigate the feasibility of knowledge transfer between neural networks for medical image segmentation tasks, specifically focusing on the transfer from a larger multi-task "Teacher" network to a smaller "Student"…

Image and Video Processing · Electrical Eng. & Systems 2024-06-06 Risab Biswas

Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks

Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model. Training an MTL model requires having the training data for all tasks available at the…

Computation and Language · Computer Science 2023-02-23 Sudipta Kar , Giuseppe Castellucci , Simone Filice , Shervin Malmasi , Oleg Rokhlenko

Knowledge Distillation and Training Balance for Heterogeneous Decentralized Multi-Modal Learning over Wireless Networks

Decentralized learning is widely employed for collaboratively training models using distributed data over wireless networks. Existing decentralized learning methods primarily focus on training single-modal networks. For the decentralized…

Information Theory · Computer Science 2023-11-14 Benshun Yin , Zhiyong Chen , Meixia Tao

MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver

Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach to train a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. However, existing Reinforcement Learning (RL)-based…

Machine Learning · Computer Science 2025-11-05 Yuepeng Zheng , Fu Luo , Zhenkun Wang , Yaoxin Wu , Yu Zhou

On effects of Knowledge Distillation on Transfer Learning

Knowledge distillation is a popular machine learning technique that aims to transfer knowledge from a large 'teacher' network to a smaller 'student' network and improve the student's performance by training it to emulate the teacher. In…

Machine Learning · Computer Science 2022-10-19 Sushil Thapa

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning…

Computation and Language · Computer Science 2019-04-23 Xiaodong Liu , Pengcheng He , Weizhu Chen , Jianfeng Gao

Task Integration Distillation for Object Detectors

Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Hai Su , ZhenWen Jian , Songsen Yu

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…

Computation and Language · Computer Science 2023-02-02 Chenglong Wang , Yi Lu , Yongyu Mu , Yimin Hu , Tong Xiao , Jingbo Zhu