Related papers: Deducing Optimal Classification Algorithm for Hete…

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…

Machine Learning · Statistics 2019-05-28 Hanyuan Hang , Xiaoyu Liu , Ingo Steinwart

MurTree: Optimal Classification Trees via Dynamic Programming and Search

Decision tree learning is a widely used approach in machine learning, favoured in applications that require concise and interpretable models. Heuristic methods are traditionally used to quickly produce models with reasonably high accuracy.…

Machine Learning · Computer Science 2022-06-30 Emir Demirović , Anna Lukina , Emmanuel Hebrard , Jeffrey Chan , James Bailey , Christopher Leckie , Kotagiri Ramamohanarao , Peter J. Stuckey

Ecological Data Analysis Based on Machine Learning Algorithms

Classification is an important supervised machine learning method, which is necessary and challenging issue for ecological research. It offers a way to classify a dataset into subsets that share common patterns. Notably, there are many…

Machine Learning · Statistics 2018-12-24 Md. Siraj-Ud-Doula , Md. Ashad Alam

Heterogeneous Random Forest

Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we…

Machine Learning · Computer Science 2024-10-28 Ye-eun Kim , Seoung Yun Kim , Hyunjoong Kim

Density-based Clustering with Best-scored Random Forest

Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called "best-scored clustering forest" that can obtain the…

Machine Learning · Statistics 2019-06-25 Hanyuan Hang , Yuchao Cai , Hanfang Yang

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…

Machine Learning · Statistics 2021-10-25 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature…

Machine Learning · Computer Science 2022-01-19 Xiaojun Mao , Liuhua Peng , Zhonglei Wang

Near Optimal Inference for the Best-Performing Algorithm

Consider a collection of competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to rank highest on a…

Machine Learning · Computer Science 2025-08-08 Amichai Painsky

The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring

Refactoring is the process of changing the internal structure of software to improve its quality without modifying its external behavior. Empirical studies have repeatedly shown that refactoring has a positive impact on the…

Software Engineering · Computer Science 2020-09-14 Maurício Aniche , Erick Maziero , Rafael Durelli , Vinicius Durelli

Random Hinge Forest for Differentiable Learning

We propose random hinge forests, a simple, efficient, and novel variant of decision forests. Importantly, random hinge forests can be readily incorporated as a general component within arbitrary computation graphs that are optimized…

Machine Learning · Statistics 2018-03-02 Nathan Lay , Adam P. Harrison , Sharon Schreiber , Gitesh Dawer , Adrian Barbu

Large Random Forests: Optimisation for Rapid Evaluation

Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with…

Machine Learning · Computer Science 2019-12-24 Frederik Gossen , Bernhard Steffen

Near Optimal Decision Trees in a SPLIT Second

Decision tree optimization is fundamental to interpretable machine learning. The most popular approach is to greedily search for the best feature at every decision point, which is fast but provably suboptimal. Recent approaches find the…

Machine Learning · Computer Science 2025-11-19 Varun Babbar , Hayden McTavish , Cynthia Rudin , Margo Seltzer

Optimal Subsampling for High-dimensional Ridge Regression

We investigate the feature compression of high-dimensional ridge regression using the optimal subsampling technique. Specifically, based on the basic framework of random sampling algorithm on feature for ridge regression and the A-optimal…

Computation · Statistics 2022-04-19 Hanyu Li , Chengmei Niu

Rolling Lookahead Learning for Optimal Classification Trees

Classification trees continue to be widely adopted in machine learning applications due to their inherently interpretable nature and scalability. We propose a rolling subtree lookahead algorithm that combines the relative scalability of the…

Machine Learning · Computer Science 2023-04-24 Zeynel Batuhan Organ , Enis Kayış , Taghi Khaniyev

Optimal Weighted Random Forests

The random forest (RF) algorithm has become a very popular prediction method for its great flexibility and promising accuracy. In RF, it is conventional to put equal weights on all the base learners (trees) to aggregate their predictions.…

Machine Learning · Statistics 2023-05-18 Xinyu Chen , Dalei Yu , Xinyu Zhang

A reinforced learning approach to optimal design under model uncertainty

Optimal designs are usually model-dependent and likely to be sub-optimal if the postulated model is not correctly specified. In practice, it is common that a researcher has a list of candidate models at hand and a design has to be found…

Statistics Theory · Mathematics 2023-03-29 Mingyao Ai , Holger Dette , Zhengfu Liu , Jun Yu

Homogeneous and Non Homogeneous Algorithms

Motivated by recent best case analyses for some sorting algorithms and based on the type of complexity we partition the algorithms into two classes: homogeneous and non homogeneous algorithms. Although both classes contain algorithms with…

Data Structures and Algorithms · Computer Science 2010-08-23 Ioannis Paparrizos

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression

Random Forest (Breiman, 2001) is a successful and widely used regression and classification algorithm. Part of its appeal and reason for its versatility is its (implicit) construction of a kernel-type weighting function on training data,…

Machine Learning · Statistics 2022-10-13 Domagoj Ćevid , Loris Michel , Jeffrey Näf , Nicolai Meinshausen , Peter Bühlmann

Optimal Policies for the Homogeneous Selective Labels Problem

Selective labels are a common feature of consequential decision-making applications, referring to the lack of observed outcomes under one of the possible decisions. This paper reports work in progress on learning decision policies in the…

Machine Learning · Computer Science 2020-11-04 Dennis Wei

Conditionally Optimal Parallel Coloring of Forests

We show the first conditionally optimal deterministic algorithm for $3$-coloring forests in the low-space massively parallel computation (MPC) model. Our algorithm runs in $O(\log \log n)$ rounds and uses optimal global space. The best…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-02 Christoph Grunau , Rustam Latypov , Yannic Maus , Shreyas Pai , Jara Uitto