English
Related papers

Related papers: Finding High-Value Training Data Subset through Di…

200 papers

Improving LLM performance on downstream tasks sometimes requires leveraging auxiliary datasets during post-training. In practice, however, developers face constraints on compute, labeling, and licensing costs that preclude using all…

Machine Learning · Computer Science 2026-05-19 Siqi Zeng , Christopher Jung , Rui Li , Zhe Kang , Ming Li , Nima Noorshams , Zhigang Wang , Fuchun Peng , Han Zhao , Xue Feng

Data selection is one of the fundamental problems in neural network training, particularly for multi-layer perceptrons (MLPs) where identifying the most valuable training samples from massive, multi-source, and heterogeneous data sources…

Machine Learning · Computer Science 2025-10-27 Xiyang Zhang , Chen Liang , Haoxuan Qiu , Hongzhi Wang

Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance. Decreasing this data requirement would…

Computer Vision and Pattern Recognition · Computer Science 2016-06-15 Maya Kabkab , Azadeh Alavi , Rama Chellappa

Training vision-based Urban Autonomous driving models is a challenging problem, which is highly researched in recent times. Training such models is a data-intensive task requiring the storage and processing of vast volumes of (possibly…

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive…

Machine Learning · Statistics 2018-12-21 Martín Abadi , Andy Chu , Ian Goodfellow , H. Brendan McMahan , Ilya Mironov , Kunal Talwar , Li Zhang

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry

Data valuation and subset selection have emerged as valuable tools for application-specific selection of important training data. However, the efficiency-accuracy tradeoffs of state-of-the-art methods hinder their widespread application to…

Machine Learning · Computer Science 2022-03-15 Soumi Das , Manasvi Sagarkar , Suparna Bhattacharya , Sourangshu Bhattacharya

Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is…

Machine Learning · Computer Science 2025-04-04 Maximilian Böther , Abraham Sebastian , Pranjal Awasthi , Ana Klimovic , Srikumar Ramalingam

Deep learning models often require large amounts of data for training, leading to increased costs. It is particularly challenging in medical imaging, i.e., gathering distributed data for centralized training, and meanwhile, obtaining…

Computer Vision and Pattern Recognition · Computer Science 2023-06-27 Zhenyu Tang , Shaoting Zhang , Xiaosong Wang

Modern pattern recognition tasks use complex algorithms that take advantage of large datasets to make more accurate predictions than traditional algorithms such as decision trees or k-nearest-neighbor better suited to describe simple…

Machine Learning · Statistics 2021-10-14 AGaurav Arwade , Sigurdur Olafsson

The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection…

Machine Learning · Computer Science 2023-09-18 Meng Xiao , Dongjie Wang , Min Wu , Pengfei Wang , Yuanchun Zhou , Yanjie Fu

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Bo Zhao , Konda Reddy Mopuri , Hakan Bilen

Continual learning aims to enable models to adapt to new datasets without losing performance on previously learned data, often assuming that prior data is no longer available. However, in many practical scenarios, both old and new data are…

Machine Learning · Computer Science 2025-03-03 Eli Verwimp , Guy Hacohen , Tinne Tuytelaars

The customizable nature of deep learning models have allowed them to be successful predictors in various disciplines. These models are often trained with respect to thousands or millions of instances for complicated problems, but the…

Machine Learning · Computer Science 2019-12-24 Drimik Roy Chowdhury , Muhammad Firmansyah Kasim

Deep learning models learn to fit training data while they are highly expected to generalize well to testing data. Most works aim at finding such models by creatively designing architectures and fine-tuning parameters. To adapt to…

Computer Vision and Pattern Recognition · Computer Science 2018-09-10 Tianyang Wang , Jun Huan , Bo Li

Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is…

Machine Learning · Statistics 2022-04-15 Laura Niss , Yuekai Sun , Ambuj Tewari

The exponential growth of volume, variety and velocity of data is raising the need for investigations of automated or semi-automated ways to extract useful patterns from the data. It requires deep expert knowledge and extensive…

Machine Learning · Computer Science 2020-07-22 Abbas Raza Ali , Marcin Budka , Bogdan Gabrys

To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model…

Machine Learning · Computer Science 2021-06-18 Xinyi Wang , Hieu Pham , Paul Michel , Antonios Anastasopoulos , Jaime Carbonell , Graham Neubig

Modern computer vision algorithms often rely on very large training datasets. However, it is conceivable that a carefully selected subsample of the dataset is sufficient for training. In this paper, we propose a gradient-based importance…

Machine Learning · Computer Science 2018-12-03 Kailas Vodrahalli , Ke Li , Jitendra Malik

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around…

Computer Vision and Pattern Recognition · Computer Science 2019-01-07 Vishal Kaushal , Rishabh Iyer , Suraj Kothawade , Rohan Mahadev , Khoshrav Doctor , Ganesh Ramakrishnan
‹ Prev 1 2 3 10 Next ›