Related papers: Generalized Linear Rule Models
General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input…
Model explainability is crucial for human users to be able to interpret how a proposed classifier assigns labels to data based on its feature values. We study generalized linear models constructed using sets of feature value rules, which…
Ensemble methods for supervised machine learning have become popular due to their ability to accurately predict class labels with groups of simple, lightweight "base learners." While ensembles offer computationally efficient models that…
Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability. However, the myopic and random search components of current rule ensemble methods can compromise this goal: they often need…
This paper proposes a new framework for learning a rule ensemble model that is both accurate and interpretable. A rule ensemble is an interpretable model based on the linear combination of weighted rules. In practice, we often face the…
Decision trees are highly interpretable models for solving classification problems in machine learning (ML). The standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy. Other discrete…
The principal innovative idea in this paper is to transform the original complex nonlinear modeling problem into a combination of linear problem and very simple nonlinear problems. The key step is the generalized linearization of nonlinear…
We introduce a new rule-based optimization method for classification with constraints. The proposed method leverages column generation for linear programming, and hence, is scalable to large datasets. The resulting pricing subproblem is…
A connection between the General Linear Model (GLM) in combination with classical statistical inference and the machine learning (MLE)-based inference is described in this paper. Firstly, the estimation of the GLM parameters is expressed as…
This article introduces a novel nonparametric methodology for Generalized Linear Models which combines the strengths of the binary regression and latent variable formulations for categorical data, while overcoming their disadvantages.…
The generalised linear model (GLM) is a very important tool for analysing real data in biology, sociology, agriculture, engineering and many other application domain where the relationship between the response and explanatory variables may…
For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional…
We consider high-dimensional generalized linear models when the covariates are contaminated by measurement error. Estimates from errors-in-variables regression models are well-known to be biased in traditional low-dimensional settings if…
Multiple linear regression is a basic statistical tool, yielding a prediction formula with the input variables, slopes, and an intercept. But is it really easy to see which terms have the largest effect, or to explain why the prediction of…
We introduce a novel rule-based approach for handling regression problems. The new methodology carries elements from two frameworks: (i) it provides information about the uncertainty of the parameters of interest using Bayesian inference,…
This study proposed an exhaustive stable/reproducible rule-mining algorithm combined to a classifier to generate both accurate and interpretable models. Our method first extracts rules (i.e., a conjunction of conditions about the values of…
This paper aims to review the methodology behind the generalized linear models which are used in analyzing the actuarial situations instead of the ordinary multiple linear regression. We introduce how to assess the adequacy of the model…
We propose a randomized method for solving linear programs with a large number of columns but a relatively small number of constraints. Since enumerating all the columns is usually unrealistic, such linear programs are commonly solved by…
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…
Generalized linear models are flexible tools for the analysis of diverse datasets, but the classical formulation requires that the parametric component is correctly specified and the data contain no atypical observations. To address these…