Related papers: TADS: Task-Aware Data Selection for Multi-Task Mul…

Data-Efficient and Robust Task Selection for Meta-Learning

Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different…

Machine Learning · Computer Science 2024-05-14 Donglin Zhan , James Anderson

A CLIP-Powered Framework for Robust and Generalizable Data Selection

Large-scale datasets have been pivotal to the advancements of deep learning models in recent years, but training on such large datasets invariably incurs substantial storage and computational overhead. Meanwhile, real-world datasets often…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Suorong Yang , Peng Ye , Wanli Ouyang , Dongzhan Zhou , Furao Shen

Learning What Helps: Task-Aligned Context Selection for Vision Tasks

Humans often resolve visual uncertainty by comparing an image with relevant examples, but ViTs lack the ability to identify which examples would improve their predictions. We present Task-Aligned Context Selection (TACS), a framework that…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Jingyu Guo , Emir Konuk , Fredrik Strand , Christos Matsoukas , Kevin Smith

Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities

Selecting appropriate training data is crucial for effective instruction fine-tuning of large language models (LLMs), which aims to (1) elicit strong capabilities, and (2) achieve balanced performance across a diverse range of tasks.…

Computation and Language · Computer Science 2025-01-22 Qirun Dai , Dylan Zhang , Jiaqi W. Ma , Hao Peng

TSDS: Data Selection for Task-Specific Model Finetuning

Finetuning foundation models for specific tasks is an emerging paradigm in modern machine learning. The efficacy of task-specific finetuning largely depends on the selection of appropriate training data. We present TSDS (Task-Specific Data…

Machine Learning · Computer Science 2024-12-30 Zifan Liu , Amin Karbasi , Theodoros Rekatsinas

Meta-learning with an Adaptive Task Scheduler

To benefit the learning of a new task, meta-learning has been proposed to transfer a well-generalized meta-model learned from various meta-training tasks. Existing meta-learning algorithms randomly sample meta-training tasks with a uniform…

Machine Learning · Computer Science 2021-10-28 Huaxiu Yao , Yu Wang , Ying Wei , Peilin Zhao , Mehrdad Mahdavi , Defu Lian , Chelsea Finn

Data Shapley Valuation for Efficient Batch Active Learning

Annotating the right set of data amongst all available data points is a key challenge in many machine learning applications. Batch active learning is a popular approach to address this, in which batches of unlabeled data points are selected…

Machine Learning · Statistics 2021-04-20 Amirata Ghorbani , James Zou , Andre Esteva

Training Aware Sigmoidal Optimizer

Proper optimization of deep neural networks is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome…

Machine Learning · Computer Science 2021-02-18 David Macêdo , Pedro Dreyer , Teresa Ludermir , Cleber Zanchettin

Towards High-Quality Temporal Action Detection with Sparse Proposals

Temporal Action Detection (TAD) is an essential and challenging topic in video understanding, aiming to localize the temporal segments containing human action instances and predict the action categories. The previous works greatly rely upon…

Computer Vision and Pattern Recognition · Computer Science 2021-09-21 Jiannan Wu , Peize Sun , Shoufa Chen , Jiewen Yang , Zihao Qi , Lan Ma , Ping Luo

MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models

High-quality data plays a critical role in the pretraining and fine-tuning of large language models (LLMs), even determining their performance ceiling to some degree. Consequently, numerous data selection methods have been proposed to…

Computation and Language · Computer Science 2025-07-08 Jiazheng Li , Lu Yu , Qing Cui , Zhiqiang Zhang , Jun Zhou , Yanfang Ye , Chuxu Zhang

LAMDAS: LLM as an Implicit Classifier for Domain-specific Data Selection

Adapting large language models (LLMs) to specific domains often faces a critical bottleneck: the scarcity of high-quality, human-curated data. While large volumes of unchecked data are readily available, indiscriminately using them for…

Computation and Language · Computer Science 2025-09-09 Jian Wu , Hang Yu , Bingchang Liu , Wenjie Yang , Peng Di , Jianguo Li , Yue Zhang

MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

Pretraining data selection has the potential to improve language model pretraining efficiency by utilizing higher-quality data from massive web data corpora. Current data selection methods, which rely on either hand-crafted rules or larger…

Computation and Language · Computer Science 2024-11-19 Zichun Yu , Spandan Das , Chenyan Xiong

MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

Multimodal models often over-rely on dominant modalities, failing to achieve optimal performance. While prior work focuses on modifying training objectives or optimization procedures, data-centric solutions remain underexplored. We propose…

Machine Learning · Computer Science 2025-10-01 Seong-Hyeon Hwang , Soyoung Choi , Steven Euijong Whang

TACOS: Open Tagging and Comparative Scoring for Instruction Fine-Tuning Data Selection

Instruction Fine-Tuning (IFT) is crucial for aligning large language models (LLMs) with human preferences, and selecting a small yet representative subset from massive data significantly facilitates IFT in terms of both efficiency and…

Computation and Language · Computer Science 2026-03-25 Xixiang He , Hao Yu , Qiyao Sun , Ao Cheng , Tailai Zhang , Cong Liu , Shuxuan Guo

Towards Balanced Active Learning for Multimodal Classification

Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples…

Multimedia · Computer Science 2023-08-22 Meng Shen , Yizheng Huang , Jianxiong Yin , Heqing Zou , Deepu Rajan , Simon See

Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection

Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Ayush K. Rai , Kyle Min , Tarun Krishna , Feiyan Hu , Alan F. Smeaton , Noel E. O'Connor

SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning

Pre-trained models exhibit strong generalization to various downstream tasks. However, given the numerous models available in the model hub, identifying the most suitable one by individually fine-tuning is time-consuming. In this paper, we…

Machine Learning · Computer Science 2026-03-10 Tengxue Zhang , Biao Ouyang , Yang Shu , Xinyang Chen , Chenjuan Guo , Bin Yang

Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network

Temporal action detection (TAD) aims to detect the semantic labels and boundaries of action instances in untrimmed videos. Current mainstream approaches are multi-step solutions, which fall short in efficiency and flexibility. In this…

Computer Vision and Pattern Recognition · Computer Science 2022-04-07 Shimin Chen , Chen Chen , Wei Li , Xunqiang Tao , Yandong Guo

Temporal Action Selection for Action Chunking

Action chunking is a widely adopted approach in Learning from Demonstration (LfD). By modeling multi-step action chunks rather than single-step actions, action chunking significantly enhances modeling capabilities for human expert policies.…

Robotics · Computer Science 2025-11-07 Yueyang Weng , Xiaopeng Zhang , Yongjin Mu , Yingcong Zhu , Yanjie Li , Qi Liu

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

As high-quality public text approaches exhaustion, a phenomenon known as the Data Wall, pre-training is shifting from more tokens to better tokens. However, existing methods either rely on heuristic static filters that ignore training…

Computation and Language · Computer Science 2026-02-10 Shaobo Wang , Xuan Ouyang , Tianyi Xu , Yuzheng Hu , Jialin Liu , Guo Chen , Tianyu Zhang , Junhao Zheng , Kexin Yang , Xingzhang Ren , Dayiheng Liu , Linfeng Zhang