Related papers: Accelerating data-driven algorithm selection for c…

Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

Data-driven algorithm design is a paradigm that uses statistical and machine learning techniques to select from a class of algorithms for a computational problem an algorithm that has the best expected performance with respect to some…

Machine Learning · Computer Science 2024-06-05 Hongyu Cheng , Sammy Khalife , Barbara Fiedorowicz , Amitabh Basu

Data-driven Algorithm Design

Data driven algorithm design is an important aspect of modern data science and algorithm design. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners often optimize over large families…

Data Structures and Algorithms · Computer Science 2020-11-17 Maria-Florina Balcan

A graphical heuristic for reduction and partitioning of large datasets for scalable supervised training

A scalable graphical method is presented for selecting, and partitioning datasets for the training phase of a classification task. For the heuristic, a clustering algorithm is required to get its computation cost in a reasonable proportion…

Machine Learning · Computer Science 2019-07-25 Sumedh Yadav , Mathis Bode

Small Data, Big Decisions: Model Selection in the Small-Data Regime

Highly overparametrized neural networks can display curiously strong generalization performance - a phenomenon that has recently garnered a wealth of theoretical and empirical research in order to better understand it. In contrast to most…

Machine Learning · Computer Science 2020-09-29 Jorg Bornschein , Francesco Visin , Simon Osindero

Feature Selection for Data-driven Explainable Optimization

Mathematical optimization, although often leading to NP-hard models, is now capable of solving even large-scale instances within reasonable time. However, the primary focus is often placed solely on optimality. This implies that while…

Optimization and Control · Mathematics 2025-12-23 Kevin-Martin Aigner , Marc Goerigk , Michael Hartisch , Frauke Liers , Arthur Miehlich , Florian Rösel

Compute-Constrained Data Selection

Data selection can reduce the amount of training data needed to finetune LLMs; however, the efficacy of data selection scales directly with its compute. Motivated by the practical challenge of compute-constrained finetuning, we consider the…

Machine Learning · Computer Science 2025-04-09 Junjie Oscar Yin , Alexander M. Rush

Randomized Block Proximal Methods for Distributed Stochastic Big-Data Optimization

In this paper we introduce a class of novel distributed algorithms for solving stochastic big-data convex optimization problems over directed graphs. In the addressed set-up, the dimension of the decision variable can be extremely high and…

Optimization and Control · Mathematics 2020-10-06 Francesco Farina , Giuseppe Notarstefano

Subset Selection for Multiple Linear Regression via Optimization

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…

Machine Learning · Statistics 2020-09-04 Young Woong Park , Diego Klabjan

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…

Databases · Computer Science 2020-03-11 Mujahid Sultan

SELECTOR: Selecting a Representative Benchmark Suite for Reproducible Statistical Comparison

Fair algorithm evaluation is conditioned on the existence of high-quality benchmark datasets that are non-redundant and are representative of typical optimization scenarios. In this paper, we evaluate three heuristics for selecting diverse…

Neural and Evolutionary Computing · Computer Science 2022-04-26 Gjorgjina Cenikj , Ryan Dieter Lang , Andries Petrus Engelbrecht , Carola Doerr , Peter Korošec , Tome Eftimov

Cheaper and Better: Selecting Good Workers for Crowdsourcing

Crowdsourcing provides a popular paradigm for data collection at scale. We study the problem of selecting subsets of workers from a given worker pool to maximize the accuracy under a budget constraint. One natural question is whether we…

Machine Learning · Statistics 2015-02-04 Hongwei Li , Qiang Liu

Hemingway: Modeling Distributed Optimization Algorithms

Distributed optimization algorithms are widely used in many industrial machine learning applications. However choosing the appropriate algorithm and cluster size is often difficult for users as the performance and convergence rate of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-21 Xinghao Pan , Shivaram Venkataraman , Zizheng Tai , Joseph Gonzalez

Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers

Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…

Machine Learning · Statistics 2024-10-16 Yijia Zhou , Kyle A. Gallivan , Adrian Barbu

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Automated Algorithm Selection: Survey and Perspectives

It has long been observed that for practically any computational problem that has been intensely studied, different instances are best solved using different algorithms. This is particularly pronounced for computationally hard problems,…

Machine Learning · Computer Science 2018-11-29 Pascal Kerschke , Holger H. Hoos , Frank Neumann , Heike Trautmann

How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design

Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made…

Machine Learning · Computer Science 2021-04-27 Maria-Florina Balcan , Dan DeBlasio , Travis Dick , Carl Kingsford , Tuomas Sandholm , Ellen Vitercik

DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size

Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance. Decreasing this data requirement would…

Computer Vision and Pattern Recognition · Computer Science 2016-06-15 Maya Kabkab , Azadeh Alavi , Rama Chellappa

Stochastic Localization Methods for Convex Discrete Optimization via Simulation

We develop and analyze a set of new sequential simulation-optimization algorithms for large-scale multi-dimensional discrete optimization via simulation problems with a convexity structure. The "large-scale" notion refers to that the…

Optimization and Control · Mathematics 2022-01-20 Haixiang Zhang , Zeyu Zheng , Javad Lavaei

Feature Selection: A Data Perspective

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature…

Machine Learning · Computer Science 2018-08-28 Jundong Li , Kewei Cheng , Suhang Wang , Fred Morstatter , Robert P. Trevino , Jiliang Tang , Huan Liu

A Data-driven Analysis of Code Optimizations

As the demand for computational power grows, optimizing code through compilers becomes increasingly crucial. In this context, we focus on fully automatic code optimization techniques that automate the process of selecting and applying code…

Programming Languages · Computer Science 2025-11-11 Yacine Hakimi , Riyadh Baghdadi