Related papers: On Tree-based Methods for Similarity Learning

A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization

The performance of many machine learning techniques depends on the choice of an appropriate similarity or distance measure on the input space. Similarity learning (or metric learning) aims at building such a measure from training data so…

Machine Learning · Statistics 2019-01-25 Robin Vogel , Aurélien Bellet , Stéphan Clémençon

Ranking Data with Continuous Labels through Oriented Recursive Partitions

We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space $\mathcal{X}$ and the goal is to order all…

Machine Learning · Statistics 2018-01-18 Stephan Clémençon , Mastane Achab

Learning Fair Scoring Functions: Bipartite Ranking under ROC-based Fairness Constraints

Many applications of AI involve scoring individuals using a learned function of their attributes. These predictive risk scores are then used to take decisions based on whether the score exceeds a certain threshold, which may vary depending…

Machine Learning · Statistics 2021-02-26 Robin Vogel , Aurélien Bellet , Stephan Clémençon

An Even Faster and More Unifying Algorithm for Comparing Trees via Unbalanced Bipartite Matchings

A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Ming-Yang Kao , Tak-Wah Lam , Wing-Kin Sung , Hing-Fung Ting

Assessing Uncertainty in Similarity Scoring: Performance & Fairness in Face Recognition

The ROC curve is the major tool for assessing not only the performance but also the fairness properties of a similarity scoring function. In order to draw reliable conclusions based on empirical ROC analysis, accurately evaluating the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Jean-Rémy Conti , Stéphan Clémençon

Precision-Recall Curve (PRC) Classification Trees

The classification of imbalanced data has presented a significant challenge for most well-known classification algorithms that were often designed for data with relatively balanced class distributions. Nevertheless skewed class distribution…

Machine Learning · Statistics 2023-04-21 Jiaju Miao , Wei Zhu

Partial order similarity based on mutual information

Comparing the ranking of candidates by different voters is an important topic in social and information science with a high relevance from the point of view of practical applications. In general, ties and pairs of incomparable candidates…

Applications · Statistics 2016-01-25 Gergely Tibély , Péter Pollner , Gergely Palla

Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

Tree-based methods are powerful nonparametric techniques in statistics and machine learning. However, their effectiveness, particularly in finite-sample settings, is not fully understood. Recent applications have revealed their surprising…

Statistics Theory · Mathematics 2024-10-04 Hengrui Luo , Meng Li

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with…

Artificial Intelligence · Computer Science 2011-06-24 F. Provost , G. M. Weiss

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates…

Machine Learning · Statistics 2019-05-28 Hanyuan Hang , Xiaoyu Liu , Ingo Steinwart

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE

A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search…

Methodology · Statistics 2019-06-26 Lisa Schlosser , Torsten Hothorn , Achim Zeileis

On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation

Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of…

Machine Learning · Statistics 2024-02-08 Matias D. Cattaneo , Jason M. Klusowski , Peter M. Tian

Comparison Based Nearest Neighbor Search

We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance…

Machine Learning · Statistics 2017-04-06 Siavash Haghiri , Debarghya Ghoshdastidar , Ulrike von Luxburg

Pairwise Fairness for Ranking and Regression

We present pairwise fairness metrics for ranking models and regression models that form analogues of statistical fairness notions such as equal opportunity, equal accuracy, and statistical parity. Our pairwise formulation supports both…

Machine Learning · Computer Science 2020-01-08 Harikrishna Narasimhan , Andrew Cotter , Maya Gupta , Serena Wang

Using theoretical ROC curves for analysing machine learning binary classifiers

Most binary classifiers work by processing the input to produce a scalar response and comparing it to a threshold value. The various measures of classifier performance assume, explicitly or implicitly, probability distributions $P_s$ and…

Machine Learning · Computer Science 2019-09-24 Luma Omar , Ioannis Ivrissimtzis

Measure Inducing Classification and Regression Trees for Functional Data

We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis, which allows to leverage representation learning and multiple splitting rules at the node level, reducing…

Machine Learning · Statistics 2020-11-03 Edoardo Belli , Simone Vantini

Generalized and Scalable Optimal Sparse Decision Trees

Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been…

Machine Learning · Computer Science 2022-11-24 Jimmy Lin , Chudi Zhong , Diane Hu , Cynthia Rudin , Margo Seltzer

Learning similarity measures from data

Defining similarity measures is a requirement for some machine learning methods. One such method is case-based reasoning (CBR) where the similarity measure is used to retrieve the stored case or set of cases most similar to the query case.…

Machine Learning · Computer Science 2020-01-16 Bjørn Magnus Mathisen , Agnar Aamodt , Kerstin Bach , Helge Langseth

A Taxonomy of Similarity Metrics for Markov Decision Processes

Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer is based on the idea of reusing the…

Machine Learning · Computer Science 2021-03-09 Álvaro Visús , Javier García , Fernando Fernández

Adaptive Verifiable Training Using Pairwise Class Similarity

Verifiable training has shown success in creating neural networks that are provably robust to a given amount of noise. However, despite only enforcing a single robustness criterion, its performance scales poorly with dataset complexity. On…

Machine Learning · Computer Science 2020-12-16 Shiqi Wang , Kevin Eykholt , Taesung Lee , Jiyong Jang , Ian Molloy