Related papers: Training Data Subset Selection for Regression with…

Subset Selection for Multiple Linear Regression via Optimization

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…

Machine Learning · Statistics 2020-09-04 Young Woong Park , Diego Klabjan

Efficient Data Subset Selection to Generalize Training Across Models: Transductive and Inductive Networks

Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches which lack generalizability. For an unseen architecture, one cannot use the subset chosen for a different…

Machine Learning · Computer Science 2024-09-20 Eeshaan Jain , Tushar Nandy , Gaurav Aggarwal , Ashish Tendulkar , Rishabh Iyer , Abir De

A Study in Dataset Pruning for Image Super-Resolution

In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset…

Image and Video Processing · Electrical Eng. & Systems 2024-06-11 Brian B. Moser , Federico Raue , Andreas Dengel

COMBSS: Best Subset Selection via Continuous Optimization

The problem of best subset selection in linear regression is considered with the aim to find a fixed size subset of features that best fits the response. This is particularly challenging when the total available number of features is very…

Methodology · Statistics 2023-11-28 Sarat Moka , Benoit Liquet , Houying Zhu , Samuel Muller

Subbagging Variable Selection for Big Data

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

Small Data, Big Decisions: Model Selection in the Small-Data Regime

Highly overparametrized neural networks can display curiously strong generalization performance - a phenomenon that has recently garnered a wealth of theoretical and empirical research in order to better understand it. In contrast to most…

Machine Learning · Computer Science 2020-09-29 Jorg Bornschein , Francesco Visin , Simon Osindero

Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters

Many key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms. With the increase of the volume of data and the size and complexity of the statistical models used…

Optimization and Control · Mathematics 2020-08-28 Filip Hanzely

Dataset Pruning: Reducing Training Data by Examining Generalization Influence

The great success of deep learning heavily relies on increasingly larger training data, which comes at a price of huge computational and infrastructural costs. This poses crucial questions that, do all training data contribute to model's…

Machine Learning · Computer Science 2023-02-28 Shuo Yang , Zeke Xie , Hanyu Peng , Min Xu , Mingming Sun , Ping Li

Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

Self-training is a classical approach in semi-supervised learning which is successfully applied to a variety of machine learning problems. Self-training algorithm generates pseudo-labels for the unlabeled examples and progressively refines…

Machine Learning · Computer Science 2020-06-22 Samet Oymak , Talha Cihad Gulcu

Towards Accelerated Model Training via Bayesian Data Selection

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety…

Machine Learning · Computer Science 2023-11-08 Zhijie Deng , Peng Cui , Jun Zhu

Sub-Setting Algorithm for Training Data Selection in Pattern Recognition

Modern pattern recognition tasks use complex algorithms that take advantage of large datasets to make more accurate predictions than traditional algorithms such as decision trees or k-nearest-neighbor better suited to describe simple…

Machine Learning · Statistics 2021-10-14 AGaurav Arwade , Sigurdur Olafsson

A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation

Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted…

Machine Learning · Statistics 2020-09-04 Seokhyun Chung , Young Woong Park , Taesu Cheong

DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size

Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance. Decreasing this data requirement would…

Computer Vision and Pattern Recognition · Computer Science 2016-06-15 Maya Kabkab , Azadeh Alavi , Rama Chellappa

Subdata selection for big data regression: an improved approach

In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…

Methodology · Statistics 2024-11-13 Vasilis Chasiotis , Dimitris Karlis

Optimal subsampling for large scale Elastic-net regression

Datasets with sheer volume have been generated from fields including computer vision, medical imageology, and astronomy whose large-scale and high-dimensional properties hamper the implementation of classical statistical models. To tackle…

Statistics Theory · Mathematics 2023-05-30 Hang Yu , Zhenxing Dou , Zhiwei Chen , Xiaomeng Yan

An analysis of training and generalization errors in shallow and deep networks

This paper is motivated by an open problem around deep networks, namely, the apparent absence of over-fitting despite large over-parametrization which allows perfect fitting of the training data. In this paper, we analyze this phenomenon in…

Machine Learning · Computer Science 2019-08-28 Hrushikesh Mhaskar , Tomaso Poggio

Efficient Neural Network Training via Subset Pretraining

In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true…

Machine Learning · Computer Science 2024-11-25 Jan Spörer , Bernhard Bermeitinger , Tomas Hrycej , Niklas Limacher , Siegfried Handschuh

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing…

Machine Learning · Computer Science 2021-06-15 Krishnateja Killamsetty , Durga Sivasubramanian , Ganesh Ramakrishnan , Rishabh Iyer

Model selection of polynomial kernel regression

Polynomial kernel regression is one of the standard and state-of-the-art learning strategies. However, as is well known, the choices of the degree of polynomial kernel and the regularization parameter are still open in the realm of model…

Machine Learning · Computer Science 2023-06-14 Shaobo Lin , Xingping Sun , Zongben Xu , Jinshan Zeng

Faster Learning by Reduction of Data Access Time

Nowadays, the major challenge in machine learning is the Big Data challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The…

Machine Learning · Computer Science 2018-07-26 Vinod Kumar Chauhan , Anuj Sharma , Kalpana Dahiya