Related papers: The Offset Tree for Learning with Partial Labels
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning. CARTs are traditionally built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and…
There is a widespread and longstanding belief that machine learning models are biased towards the majority class when learning from imbalanced binary response data, leading them to neglect or ignore the minority class. Motivated by a recent…
In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as…
Classification of datasets into two or more distinct classes is an important machine learning task. Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily…
Based on decision trees, many fields have arguably made tremendous progress in recent years. In simple words, decision trees use the strategy of "divide-and-conquer" to divide the complex problem on the dependency between input features and…
Modern pattern recognition tasks use complex algorithms that take advantage of large datasets to make more accurate predictions than traditional algorithms such as decision trees or k-nearest-neighbor better suited to describe simple…
Random forests are among the most famous algorithms for solving classification problems, in particular for large-scale data sets. Considering a set of labeled points and several decision trees, the method takes the majority vote to classify…
Decision trees are one of the most popular methods for solving classification problems, mainly because of their good interpretability properties. Moreover, due to advances in recent years in mixed-integer optimization, several models have…
A binary classifier capable of abstaining from making a label prediction has two goals in tension: minimizing errors, and avoiding abstaining unnecessarily often. In this work, we exactly characterize the best achievable tradeoff between…
This paper proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by…
The decision tree is one of the most fundamental programming abstractions. A commonly used type of decision tree is the alphabetic binary tree, which uses (without loss of generality) ``less than'' versus ''greater than or equal to'' tests…
In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where…
Using Machine Learning systems in the real world can often be problematic, with inexplicable black-box models, the assumed certainty of imperfect measurements, or providing a single classification instead of a probability distribution. This…
In recent years, automated data-driven decision-making systems have enjoyed a tremendous success in a variety of fields (e.g., to make product recommendations, or to guide the production of entertainment). More recently, these algorithms…
We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…
When using machine learning for imbalanced binary classification problems, it is common to subsample the majority class to create a (more) balanced training dataset. This biases the model's predictions because the model learns from data…
Learning under one-sided feedback (i.e., where we only observe the labels for examples we predicted positively on) is a fundamental problem in machine learning -- applications include lending and recommendation systems. Despite this, there…
Decision tree and random forest classification and regression are some of the most widely used in machine learning approaches. Binary decision tree implementations commonly use conditioning in the form 'feature $\leq$ (or $<$) threshold',…
Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of…
Decision trees are a popular technique in statistical data classification. They recursively partition the feature space into disjoint sub-regions until each sub-region becomes homogeneous with respect to a particular class. The basic…