Related papers: Split Regression Modeling

Multi-Model Subset Selection

The two primary approaches for high-dimensional regression problems are sparse methods (e.g., best subset selection, which uses the L0-norm in the penalty) and ensemble methods (e.g., random forests). Although sparse methods typically yield…

Methodology · Statistics 2024-10-31 Anthony-Alexander Christidis , Stefan Van Aelst , Ruben Zamar

Robust subset selection

The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its…

Methodology · Statistics 2022-01-11 Ryan Thompson

A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees

Machine learning algorithms aim at minimizing the number of false decisions and increasing the accuracy of predictions. However, the high predictive power of advanced algorithms comes at the costs of transparency. State-of-the-art methods,…

Machine Learning · Computer Science 2019-08-30 Klaus Broelemann , Gjergji Kasneci

Splitting strategies for post-selection inference

We consider the problem of providing valid inference for a selected parameter in a sparse regression setting. It is well known that classical regression tools can be unreliable in this context due to the bias generated in the selection…

Methodology · Statistics 2022-12-07 Daniel G. Rasines , G. Alastair Young

Supersparse Linear Integer Models for Interpretable Classification

Scoring systems are classification models that only require users to add, subtract and multiply a few meaningful numbers to make a prediction. These models are often used because they are practical and interpretable. In this paper, we…

Machine Learning · Statistics 2014-04-14 Berk Ustun , Stefano Tracà , Cynthia Rudin

Estimation from Partially Sampled Distributed Traces

Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used…

Data Structures and Algorithms · Computer Science 2021-07-19 Otmar Ertl

Sparse Partitioning: Nonlinear regression with binary or tertiary predictors, with application to association studies

This paper presents Sparse Partitioning, a Bayesian method for identifying predictors that either individually or in combination with others affect a response variable. The method is designed for regression problems involving binary or…

Quantitative Methods · Quantitative Biology 2011-08-31 Doug Speed , Simon Tavaré

Randomized maximum-contrast selection: subagging for large-scale regression

We introduce a very general method for sparse and large-scale variable selection. The large-scale regression settings is such that both the number of parameters and the number of samples are extremely large. The proposed method is based on…

Statistics Theory · Mathematics 2019-07-31 Jelena Bradic

Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts

Process Outcome Prediction entails predicting a discrete property of an unfinished process instance from its partial trace. High-capacity outcome predictors discovered with ensemble and deep learning methods have been shown to achieve top…

Machine Learning · Computer Science 2024-07-19 Francesco Folino , Luigi Pontieri , Pietro Sabatino

Model Agnostic Explainable Selective Regression via Uncertainty Estimation

With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to…

Machine Learning · Computer Science 2023-11-16 Andrea Pugnana , Carlos Mougan , Dan Saattrup Nielsen

Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique

In high-dimensional generalized linear models, it is crucial to identify a sparse model that adequately accounts for response variation. Although the best subset section has been widely regarded as the Holy Grail of problems of this type,…

Machine Learning · Statistics 2023-08-02 Junxian Zhu , Jin Zhu , Borui Tang , Xuanyu Chen , Hongmei Lin , Xueqin Wang

Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free Inference

Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting…

Statistics Theory · Mathematics 2018-04-04 Alessandro Rinaldo , Larry Wasserman , Max G'Sell , Jing Lei

Obtaining Explainable Classification Models using Distributionally Robust Optimization

Model explainability is crucial for human users to be able to interpret how a proposed classifier assigns labels to data based on its feature values. We study generalized linear models constructed using sets of feature value rules, which…

Machine Learning · Statistics 2023-11-06 Sanjeeb Dash , Soumyadip Ghosh , Joao Goncalves , Mark S. Squillante

Prediction and estimation consistency of sparse multi-class penalized optimal scoring

Sparse linear discriminant analysis via penalized optimal scoring is a successful tool for classification in high-dimensional settings. While the variable selection consistency of sparse optimal scoring has been established, the…

Statistics Theory · Mathematics 2021-04-01 Irina Gaynanova

Minimax and Communication-Efficient Distributed Best Subset Selection with Oracle Property

The explosion of large-scale data in fields such as finance, e-commerce, and social media has outstripped the processing capabilities of single-machine systems, driving the need for distributed statistical inference methods. Traditional…

Machine Learning · Statistics 2024-09-02 Jingguo Lan , Hongmei Lin , Xueqin Wang

Sparse Compositional Metric Learning

We propose a new approach for metric learning by framing it as learning a sparse combination of locally discriminative metrics that are inexpensive to generate from the training data. This flexible framework allows us to naturally derive…

Machine Learning · Computer Science 2019-01-25 Yuan Shi , Aurélien Bellet , Fei Sha

Scalable Interpretable Multi-Response Regression via SEED

Sparse reduced-rank regression is an important tool to uncover meaningful dependence structure between large numbers of predictors and responses in many big data applications such as genome-wide association studies and social media…

Methodology · Statistics 2016-08-15 Mohammad Taha Bahadori , Zemin Zheng , Yan Liu , Jinchi Lv

Sparsity in Optimal Randomized Classification Trees

Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity…

Optimization and Control · Mathematics 2020-02-24 Rafael Blanquero , Emilio Carrizosa , Cristina Molero-Río , Dolores Romero Morales

Scaled Sparse Linear Regression

Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual…

Machine Learning · Statistics 2012-06-22 Tingni Sun , Cun-Hui Zhang

SMACE: A New Method for the Interpretability of Composite Decision Systems

Interpretability is a pressing issue for decision systems. Many post hoc methods have been proposed to explain the predictions of a single machine learning model. However, business processes and decision systems are rarely centered around a…

Machine Learning · Computer Science 2023-03-22 Gianluigi Lopardo , Damien Garreau , Frederic Precioso , Greger Ottosson