Related papers: Linear-Complexity Exponentially-Consistent Tests f…
Outlier hypothesis testing is studied in a universal setting. Multiple sequences of observations are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are distributed…
Universal outlier hypothesis testing is studied in a sequential setting. Multiple observation sequences are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are…
The following detection problem is studied, in which there are $M$ sequences of samples out of which one outlier sequence needs to be detected. Each typical sequence contains $n$ independent and identically distributed (i.i.d.) continuous…
In this work, we revisit outlier hypothesis testing and propose exponentially consistent, low-complexity fixed-length tests that achieve a better tradeoff between detection performance and computational complexity than existing…
In outlier hypothesis testing, one aims to detect outlying sequences among a given set of sequences, where most sequences are generated i.i.d. from a nominal distribution while outlying sequences (outliers) are generated i.i.d. from a…
We revisit outlier hypothesis testing, propose exponentially consistent low complexity fixed-length and sequential tests and show that our tests achieve better tradeoff between detection performance and computational complexity than…
Universal outlier hypothesis testing refers to a hypothesis testing problem where one observes a large number of length-$n$ sequences -- the majority of which are distributed according to the typical distribution $\pi$ and a small number…
Outliers are ubiquitous in modern data sets. Distance-based techniques are a popular non-parametric approach to outlier detection as they require no prior assumptions on the data generating distribution and are simple to implement. Scaling…
This paper develops a flexible distribution-free method for collective outlier detection and enumeration, designed for situations in which the presence of outliers can be detected powerfully even though their precise identification may be…
In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with…
Clustering has many important applications in computer science, but real-world datasets often contain outliers. Moreover, the presence of outliers can make the clustering problems to be much more challenging. To reduce the complexities,…
Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier…
We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by…
A sensor network is considered where at each sensor a sequence of random variables is observed. At each time step, a processed version of the observations is transmitted from the sensors to a common node called the fusion center. At some…
Often the challenge associated with tasks like fraud and spam detection[1] is the lack of all likely patterns needed to train suitable supervised learning models. In order to overcome this limitation, such tasks are attempted as outlier or…
We revisit the outlier hypothesis testing framework of Li \emph{et al.} (TIT 2014) and derive fundamental limits for the optimal test under the generalized Neyman-Pearson criterion. In outlier hypothesis testing, one is given multiple…
We revisit sequential outlier hypothesis testing and derive bounds on achievable exponents when both the nominal and anomalous distributions are unknown. The task of outlier hypothesis testing is to identify the set of outliers that are…
In the binary hypothesis testing problem, it is well known that sequentiality in taking samples eradicates the trade-off between two error exponents, yet implementing the optimal test requires the knowledge of the underlying distributions,…
There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The…
Real-world datasets often contain outliers, and the presence of outliers can make the clustering problems to be much more challenging. In this paper, we propose a simple uniform sampling framework for solving three representative…