Related papers: Distance approximation using Isolation Forests

Isolation forests: looking beyond tree depth

The isolation forest algorithm for outlier detection exploits a simple yet effective observation: if taking some multivariate data and making uniformly random cuts across the feature space recursively, it will take fewer such random cuts…

Machine Learning · Statistics 2021-11-24 David Cortes

Revisiting randomized choices in isolation forests

Isolation forest or "iForest" is an intuitive and widely used algorithm for anomaly detection that follows a simple yet effective idea: in a given data distribution, if a threshold (split point) is selected uniformly at random within the…

Machine Learning · Statistics 2021-12-07 David Cortes

Random Similarity Isolation Forests

With predictive models becoming prevalent, companies are expanding the types of data they gather. As a result, the collected datasets consist not only of simple numerical features but also more complex objects such as time series, images,…

Machine Learning · Computer Science 2025-07-01 Sebastian Chwilczyński , Dariusz Brzezinski

Explainable Unsupervised Anomaly Detection with Random Forest

We describe the use of an unsupervised Random Forest for similarity learning and improved unsupervised anomaly detection. By training a Random Forest to discriminate between real data and synthetic data sampled from a uniform distribution…

Machine Learning · Statistics 2025-04-23 Joshua S. Harvey , Joshua Rosaler , Mingshu Li , Dhruv Desai , Dhagash Mehta

Random Similarity Forests

The wealth of data being gathered about humans and their surroundings drives new machine learning applications in various fields. Consequently, more and more often, classifiers are trained using not only numerical data but also complex data…

Machine Learning · Computer Science 2022-04-13 Maciej Piernik , Dariusz Brzezinski , Pawel Zawadzki

Theoretical Investigation on Inductive Bias of Isolation Forest

Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector, primarily owing to its remarkable runtime efficiency and superior performance in large-scale tasks. Despite its widespread adoption, a theoretical…

Machine Learning · Computer Science 2026-01-28 Qin-Cheng Zheng , Shao-Qun Zhang , Shen-Huan Lyu , Yuan Jiang , Zhi-Hua Zhou

Distances between Data Sets Based on Summary Statistics

The concepts of similarity and distance are crucial in data mining. We consider the problem of defining the distance between two data sets by comparing summary statistics computed from the data sets. The initial definition of our distance…

Data Structures and Algorithms · Computer Science 2019-02-05 Nikolaj Tatti

Tree Edit Distance with Variables. Measuring the Similarity between Mathematical Formulas

In this article, we propose tree edit distance with variables, which is an extension of the tree edit distance to handle trees with variables and has a potential application to measuring the similarity between mathematical formulas,…

Data Structures and Algorithms · Computer Science 2021-05-12 Tatsuya Akutsu , Tomoya Mori , Naotoshi Nakamura , Satoshi Kozawa , Yuhei Ueno , Thomas N. Sato

Random Forests for Metric Learning with Implicit Pairwise Position Dependence

Metric learning makes it plausible to learn distances for complex distributions of data from labeled data. However, to date, most metric learning methods are based on a single Mahalanobis metric, which cannot handle heterogeneous data well.…

Machine Learning · Statistics 2012-01-04 Caiming Xiong , David Johnson , Ran Xu , Jason J. Corso

The similarity metric

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of…

Computational Complexity · Computer Science 2011-11-09 Ming Li , Xin Chen , Xin Li , Bin Ma , Paul Vitanyi

The Extended Edit Distance Metric

Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high…

Information Retrieval · Computer Science 2010-06-18 Muhammad Marwan Muhammad Fuad , Pierre-François Marteau

Deep Isolation Forest for Anomaly Detection

Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years due to its general effectiveness across different benchmarks and strong scalability. Nevertheless, its linear axis-parallel isolation…

Machine Learning · Computer Science 2023-06-12 Hongzuo Xu , Guansong Pang , Yijie Wang , Yongjun Wang

A Graph-Matching Formulation of the Interleaving Distance between Merge Trees

In this work we study the interleaving distance between merge trees from a combinatorial point of view. We use a particular type of matching between trees to obtain a novel formulation of the distance. With such formulation, we tackle the…

Combinatorics · Mathematics 2024-11-11 Matteo Pegoraro

Distribution and volume based scoring for Isolation Forests

We make two contributions to the Isolation Forest method for anomaly and outlier detection. The first contribution is an information-theoretically motivated generalisation of the score function that is used to aggregate the scores across…

Machine Learning · Statistics 2023-09-21 Hichem Dhouib , Alissa Wilms , Paul Boes

Extended Isolation Forest with feature sensitivities

Compared to theoretical frameworks that assume equal sensitivity to deviations in all features of data, the theory of anomaly detection allowing for variable sensitivity across features is less developed. To the best of our knowledge, this…

Methodology · Statistics 2026-02-11 Illia Donhauzer

Approximating Metrics by Tree Metrics of Small Distance-Weighted Average Stretch

We study the problem of how well a tree metric is able to preserve the sum of pairwise distances of an arbitrary metric. This problem is closely related to low-stretch metric embeddings and is interesting by its own flavor from the line of…

Data Structures and Algorithms · Computer Science 2013-01-16 Mong-Jen Kao , Der-Tsai Lee , Dorothea Wagner

Learning Order Forest for Qualitative-Attribute Data Clustering

Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute…

Machine Learning · Statistics 2026-03-05 Mingjie Zhao , Sen Feng , Yiqun Zhang , Mengke Li , Yang Lu , Yiu-ming Cheung

Comparison-Based Random Forests

Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C)…

Machine Learning · Statistics 2018-06-19 Siavash Haghiri , Damien Garreau , Ulrike von Luxburg

The SuperM-Tree: Indexing metric spaces with sized objects

A common approach to implementing similarity search applications is the usage of distance functions, where small distances indicate high similarity. In the case of metric distance functions, metric index structures can be used to accelerate…

Data Structures and Algorithms · Computer Science 2019-02-05 Jörg P. Bachmann

Distance-Based Bias in Model-Directed Optimization of Additively Decomposable Problems

For many optimization problems it is possible to define a distance metric between problem variables that correlates with the likelihood and strength of interactions between the variables. For example, one may define a metric so that the…

Neural and Evolutionary Computing · Computer Science 2012-01-12 Martin Pelikan , Mark W. Hauschild