English
Related papers

Related papers: Reinforced Data Sampling for Model Diversification

200 papers

To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model…

Machine Learning · Computer Science 2021-06-18 Xinyi Wang , Hieu Pham , Paul Michel , Antonios Anastasopoulos , Jaime Carbonell , Graham Neubig

A common strategy in transfer learning is few shot fine-tuning, but its success is highly dependent on the quality of samples selected as training examples. Active learning methods such as uncertainty sampling and diversity sampling can…

Computation and Language · Computer Science 2026-04-23 Wei Han , David Martinez , Anna Khanina , Lawrence Cavedon , Karin Verspoor

Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of…

Methodology · Statistics 2025-01-06 Justin Weltz , Angela Yoon , Yichi Zhang , Alexander Volfovsky , Eric Laber

The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered…

Software Engineering · Computer Science 2021-03-10 Linghan Meng , Yanhui Li , Lin Chen , Zhi Wang , Di Wu , Yuming Zhou , Baowen Xu

Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep learning tasks, which reduces the need for human labor. Previous studies primarily focus on effectively utilising the labelled and unlabeled data to improve…

Machine Learning · Computer Science 2024-10-29 Qian Shao , Jiangrui Kang , Qiyuan Chen , Zepeng Li , Hongxia Xu , Yiwen Cao , Jiajuan Liang , Jian Wu

Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more…

Machine Learning · Computer Science 2025-06-27 Suorong Yang , Peijia Li , Furao Shen , Jian Zhao

As a part of the Data-Centric AI Competition, we propose a data-centric approach to improve the diversity of the training samples by iterative sampling. The method itself relies strongly on the fidelity of augmented samples and the…

Machine Learning · Computer Science 2021-11-09 Devrim Cavusoglu , Ogulcan Eryuksel , Sinan Altinuc

Diversity in demonstration selection is critical for enhancing model generalization by enabling broader coverage of structures and concepts. Constructing appropriate demonstration sets remains a key research challenge. This paper introduces…

Artificial Intelligence · Computer Science 2025-05-27 Xubin Wang , Jianfei Wu , Yichen Yuan , Deyu Cai , Mingzhe Li , Weijia Jia

Distributed multi-party learning provides an effective approach for training a joint model with scattered data under legal and practical constraints. However, due to the quagmire of a skewed distribution of data labels across participants…

Machine Learning · Computer Science 2021-11-01 Maoguo Gong , Yuan Gao , Yue Wu , A. K. Qin

In (\cite{zhang2014nonlinear,zhang2014nonlinear2}), we have viewed machine learning as a coding and dimensionality reduction problem, and further proposed a simple unsupervised dimensionality reduction method, entitled deep distributed…

Machine Learning · Computer Science 2015-01-29 Xiao-Lei Zhang

Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities. As instruction datasets proliferate, selecting optimal data for effective training becomes…

Computation and Language · Computer Science 2024-09-18 Simon Yu , Liangyu Chen , Sara Ahmadian , Marzieh Fadaee

Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the…

Machine Learning · Computer Science 2020-06-08 Aurora Cobo Aguilera , Antonio Artés-Rodríguez , Fernando Pérez-Cruz , Pablo Martínez Olmos

Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a…

Machine Learning · Computer Science 2022-12-14 Jason Xiaotian Dou , Alvin Qingkai Pan , Runxue Bao , Haiyi Harry Mao , Lei Luo , Zhi-Hong Mao

Conventional deep network training generally optimizes all samples under a largely uniform learning paradigm, without explicitly modeling the heterogeneous competition among them. Such an oversimplified treatment can lead to several…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Ying Zheng , Yiyi Zhang , Yi Wang , Lap-Pui Chau

General-purpose open-domain dense retrieval systems are usually trained with a large, eclectic mix of corpora and search tasks. How should these diverse corpora and tasks be sampled for training? Conventional approaches sample them…

Information Retrieval · Computer Science 2026-01-30 Meet Doshi , Vishwajeet Kumar , Yulong Li , Jaydeep Sen

Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by…

Machine Learning · Computer Science 2026-02-02 Deyang Kong , Qi Guo , Xiangyu Xi , Wei Wang , Jingang Wang , Xunliang Cai , Shikun Zhang , Wei Ye

Data quality or data evaluation is sometimes a task as important as collecting a large volume of data when it comes to generating accurate artificial intelligence models. In fact, being able to evaluate the data can lead to a larger…

Machine Learning · Computer Science 2023-05-24 Eloy Anguiano Batanero , Ángela Fernández Pascual , Álvaro Barbero Jiménez

Limiting failures of machine learning systems is of paramount importance for safety-critical applications. In order to improve the robustness of machine learning systems, Distributionally Robust Optimization (DRO) has been proposed as a…

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

A big challenge in branch and bound lies in identifying the optimal node within the search tree from which to proceed. Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node…

Machine Learning · Computer Science 2024-06-06 Alexander Mattick , Christopher Mutschler
‹ Prev 1 2 3 10 Next ›