Related papers: Sub-Setting Algorithm for Training Data Selection …

Efficient Data Subset Selection to Generalize Training Across Models: Transductive and Inductive Networks

Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches which lack generalizability. For an unseen architecture, one cannot use the subset chosen for a different…

Machine Learning · Computer Science 2024-09-20 Eeshaan Jain , Tushar Nandy , Gaurav Aggarwal , Ashish Tendulkar , Rishabh Iyer , Abir De

Learning Ensembles of Interpretable Simple Structure

Decision-making in complex systems often relies on machine learning models, yet highly accurate models such as XGBoost and neural networks can obscure the reasoning behind their predictions. In operations research applications,…

Machine Learning · Computer Science 2025-02-28 Gaurav Arwade , Sigurdur Olafsson

Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms

A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as…

Machine Learning · Computer Science 2020-02-21 Pietro Barbiero , Giovanni Squillero , Alberto Tonda

Reinforced Decision Trees

In order to speed-up classification models when facing a large number of categories, one usual approach consists in organizing the categories in a particular structure, this structure being then used as a way to speed-up the prediction…

Machine Learning · Computer Science 2015-11-26 Aurélia Léon , Ludovic Denoyer

Finding High-Value Training Data Subset through Differentiable Convex Programming

Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the "value" of individual training datapoints have been proposed…

Machine Learning · Computer Science 2021-04-29 Soumi Das , Arshdeep Singh , Saptarshi Chatterjee , Suparna Bhattacharya , Sourangshu Bhattacharya

An Analysis of Reduced Error Pruning

Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the…

Artificial Intelligence · Computer Science 2011-06-06 T. Elomaa , M. Kaariainen

Complex Networks for Pattern-Based Data Classification

Data classification techniques partition the data or feature space into smaller sub-spaces, each corresponding to a specific class. To classify into subspaces, physical features e.g., distance and distributions are utilized. This approach…

Machine Learning · Computer Science 2025-03-11 Josimar Chire , Khalid Mahmood , Zhao Liang

Learning a Decision Tree Algorithm with Transformers

Decision trees are renowned for their ability to achieve high predictive performance while remaining interpretable, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data…

Machine Learning · Computer Science 2024-08-27 Yufan Zhuang , Liyuan Liu , Chandan Singh , Jingbo Shang , Jianfeng Gao

The Offset Tree for Learning with Partial Labels

We present an algorithm, called the Offset Tree, for learning to make decisions in situations where the payoff of only one choice is observed, rather than all choices. The algorithm reduces this setting to binary classification, allowing…

Machine Learning · Computer Science 2016-04-05 Alina Beygelzimer , John Langford

Utilizing Data Fingerprints for Privacy-Preserving Algorithm Selection in Time Series Classification: Performance and Uncertainty Estimation on Unseen Datasets

The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection,…

Machine Learning · Computer Science 2024-10-02 Lars Böcking , Leopold Müller , Niklas Kühl

Selective Embedding for Deep Learning

Deep learning has revolutionized many industries by enabling models to automatically learn complex patterns from raw data, reducing dependence on manual feature engineering. However, deep learning algorithms are sensitive to input data, and…

Machine Learning · Computer Science 2025-07-21 Mert Sehri , Zehui Hua , Francisco de Assis Boldt , Patrick Dumond

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is…

Machine Learning · Computer Science 2025-04-04 Maximilian Böther , Abraham Sebastian , Pranjal Awasthi , Ana Klimovic , Srikumar Ramalingam

One-step learning algorithm selection for classification via convolutional neural networks

As with any task, the process of building machine learning models can benefit from prior experience. Meta-learning for classifier selection leverages knowledge about the characteristics of different datasets and/or the past performance of…

Machine Learning · Computer Science 2025-08-26 Sebastian Maldonado , Carla Vairetti , Ignacio Figueroa

Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

Data-driven algorithm design is a paradigm that uses statistical and machine learning techniques to select from a class of algorithms for a computational problem an algorithm that has the best expected performance with respect to some…

Machine Learning · Computer Science 2024-06-05 Hongyu Cheng , Sammy Khalife , Barbara Fiedorowicz , Amitabh Basu

Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data…

Computer Vision and Pattern Recognition · Computer Science 2018-05-30 Vishal Kaushal , Anurag Sahoo , Khoshrav Doctor , Narasimha Raju , Suyash Shetty , Pankaj Singh , Rishabh Iyer , Ganesh Ramakrishnan

Learning accurate and interpretable tree-based models

Decision trees and their ensembles are popular in machine learning as easy-to-understand models. Several techniques have been proposed in the literature for learning tree-based classifiers, with different techniques working well for data…

Machine Learning · Computer Science 2025-05-20 Maria-Florina Balcan , Dravyansh Sharma

Ensemble of Example-Dependent Cost-Sensitive Decision Trees

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard classification methods do not take…

Machine Learning · Computer Science 2015-05-19 Alejandro Correa Bahnsen , Djamila Aouada , Bjorn Ottersten

Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously…

Machine Learning · Computer Science 2020-09-04 Ziyi Yang , Jun Shu , Yong Liang , Deyu Meng , Zongben Xu

Efficient Neural Network Training via Subset Pretraining

In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true…

Machine Learning · Computer Science 2024-11-25 Jan Spörer , Bernhard Bermeitinger , Tomas Hrycej , Niklas Limacher , Siegfried Handschuh

Enhancing Simple Models by Exploiting What They Already Know

There has been recent interest in improving performance of simple models for multiple reasons such as interpretability, robust learning from small data, deployment in memory constrained settings as well as environmental considerations. In…

Machine Learning · Computer Science 2020-06-23 Amit Dhurandhar , Karthikeyan Shanmugam , Ronny Luss