Related papers: Learning with Subset Stacking

Error Reduction from Stacked Regressions

Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy. The conventional approach uses cross-validation data to generate predictions from the…

Machine Learning · Statistics 2024-10-10 Xin Chen , Jason M. Klusowski , Yan Shuo Tan

Systematic Ensemble Learning for Regression

The motivation of this work is to improve the performance of standard stacking approaches or ensembles, which are composed of simple, heterogeneous base models, through the integration of the generation and selection stages for regression…

Machine Learning · Statistics 2014-03-31 Roberto Aldave , Jean-Pierre Dussault

Super learning in the SAS system

Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. An implementation of stacking, called super…

Machine Learning · Statistics 2019-08-01 Alexander P. Keil , Daniel Westreich , Jessie K Edwards , Stephen R Cole

LESS: Selecting Influential Data for Targeted Instruction Tuning

Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills…

Computation and Language · Computer Science 2024-06-14 Mengzhou Xia , Sadhika Malladi , Suchin Gururangan , Sanjeev Arora , Danqi Chen

Adaptive Random SubSpace Learning (RSSL) Algorithm for Prediction

We present a novel adaptive random subspace learning algorithm (RSSL) for prediction purpose. This new framework is flexible where it can be adapted with any learning technique. In this paper, we tested the algorithm for regression and…

Machine Learning · Computer Science 2015-02-10 Mohamed Elshrif , Ernest Fokoue

A Human-Centered Approach for Improving Supervised Learning

Supervised Learning is a way of developing Artificial Intelligence systems in which a computer algorithm is trained on labeled data inputs. Effectiveness of a Supervised Learning algorithm is determined by its performance on a given dataset…

Computers and Society · Computer Science 2024-10-29 Shubhi Bansal , Atharva Tendulkar , Nagendra Kumar

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimizers and…

Machine Learning · Computer Science 2026-05-21 Jingwen Liu , Ezra Edelman , Surbhi Goel , Bingbin Liu

A Sparsity-Aware Adaptive Algorithm for Distributed Learning

In this paper, a sparsity-aware adaptive algorithm for distributed learning in diffusion networks is developed. The algorithm follows the set-theoretic estimation rationale. At each time instance and at each node of the network, a closed…

Information Theory · Computer Science 2015-06-03 Symeon Chouvardas , Konstantinos Slavakis , Yannis Kopsinis , Sergios Theodoridis

MetaBags: Bagged Meta-Decision Trees for Regression

Ensembles are popular methods for solving practical supervised learning problems. They reduce the risk of having underperforming models in production-grade software. Although critical, methods for learning heterogeneous regression ensembles…

Machine Learning · Computer Science 2018-04-18 Jihed Khiari , Luis Moreira-Matias , Ammar Shaker , Bernard Zenko , Saso Dzeroski

Set Prediction without Imposing Structure as Conditional Density Estimation

Set prediction is about learning to predict a collection of unordered variables with unknown interrelations. Training such models with set losses imposes the structure of a metric space over sets. We focus on stochastic and underdefined…

Machine Learning · Computer Science 2021-02-23 David W. Zhang , Gertjan J. Burghouts , Cees G. M. Snoek

Linear Regression using Heterogeneous Data Batches

In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one…

Machine Learning · Computer Science 2023-09-06 Ayush Jain , Rajat Sen , Weihao Kong , Abhimanyu Das , Alon Orlitsky

Learning to Split for Automatic Bias Detection

Classifiers are biased when trained on biased datasets. As a remedy, we propose Learning to Split (ls), an algorithm for automatic bias detection. Given a dataset with input-label pairs, ls learns to split this dataset so that predictors…

Machine Learning · Computer Science 2022-07-22 Yujia Bao , Regina Barzilay

Meta-learning of semi-supervised learning from tasks with heterogeneous attribute spaces

We propose a meta-learning method for semi-supervised learning that learns from multiple tasks with heterogeneous attribute spaces. The existing semi-supervised meta-learning methods assume that all tasks share the same attribute space,…

Machine Learning · Computer Science 2023-11-10 Tomoharu Iwata , Atsutoshi Kumagai

Learning where to learn: Gradient sparsity in meta and continual learning

Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this…

Machine Learning · Computer Science 2021-10-28 Johannes von Oswald , Dominic Zhao , Seijin Kobayashi , Simon Schug , Massimo Caccia , Nicolas Zucchet , João Sacramento

Least Angle Regression

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a…

Statistics Theory · Mathematics 2007-06-13 Bradley Efron , Trevor Hastie , Iain Johnstone , Robert Tibshirani

Sub-Setting Algorithm for Training Data Selection in Pattern Recognition

Modern pattern recognition tasks use complex algorithms that take advantage of large datasets to make more accurate predictions than traditional algorithms such as decision trees or k-nearest-neighbor better suited to describe simple…

Machine Learning · Statistics 2021-10-14 AGaurav Arwade , Sigurdur Olafsson

Rep the Set: Neural Networks for Learning Set Representations

In several domains, data objects can be decomposed into sets of simpler objects. It is then natural to represent each object as the set of its components or parts. Many conventional machine learning algorithms are unable to process this…

Machine Learning · Computer Science 2020-03-03 Konstantinos Skianis , Giannis Nikolentzos , Stratis Limnios , Michalis Vazirgiannis

Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems

Aggregating multiple learners through an ensemble of models aim to make better predictions by capturing the underlying distribution of the data more accurately. Different ensembling methods, such as bagging, boosting, and stacking/blending,…

Machine Learning · Statistics 2020-11-03 Mohsen Shahhosseini , Guiping Hu , Hieu Pham

Distributed Parameter Estimation via Pseudo-likelihood

Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…

Machine Learning · Computer Science 2012-07-03 Qiang Liu , Alexander Ihler

On Sampling Strategies for Neural Network-based Collaborative Filtering

Recent advances in neural networks have inspired people to design hybrid recommendation algorithms that can incorporate both (1) user-item interaction information and (2) content information including image, audio, and text. Despite their…

Machine Learning · Computer Science 2017-06-27 Ting Chen , Yizhou Sun , Yue Shi , Liangjie Hong