Related papers: Efficient Matrix Profile Computation Using Differe…
The Matrix Profile (MP), a versatile tool for time series data mining, has been shown effective in time series anomaly detection (TSAD). This paper delves into the problem of anomaly detection in multidimensional time series, a common…
To improve our understanding of connected systems, different tools derived from statistics, signal processing, information theory and statistical physics have been developed in the last decade. Here, we will focus on the graph comparison…
Although recovering an Euclidean distance matrix from noisy observations is a common problem in practice, how well this could be done remains largely unknown. To fill in this void, we study a simple distance matrix estimate based upon the…
Sensor-based remote health monitoring is used in industrial, urban and healthcare settings to monitor ongoing operation of equipment and human health. An important aim is to intervene early if anomalous events or adverse health is detected.…
Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the…
The last decade has seen a flurry of research on all-pairs-similarity-search (or, self-join) for text, DNA, and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. Surprisingly, however,…
Euclidean distance matrices (EDM) are matrices of squared distances between points. The definition is deceivingly simple: thanks to their many useful properties they have found applications in psychometrics, crystallography, machine…
This paper gives simple distributed algorithms for the fundamental problem of computing graph distances in the Congested Clique model. One of the main components of our algorithms is fast matrix multiplication, for which we show an…
Matrix Profile (MP) methods are an interpretable and scalable family of distance-based methods for time-series anomaly detection, but strong benchmark performance still depends on design choices beyond a vanilla nearest-neighbor profile.…
Riemannian optimization uses local methods to solve optimization problems whose constraint set is a smooth manifold. A linear step along some descent direction usually leaves the constraints, and hence retraction maps are used to…
The paper introduces a special case of the Euclidean distance matrix completion problem (edmcp) of interest in statistical data analysis where only the minimal spanning tree distances are given and the matrix completion must preserve the…
Understanding the singular value spectrum of a matrix $A \in \mathbb{R}^{n \times n}$ is a fundamental task in countless applications. In matrix multiplication time, it is possible to perform a full SVD and directly compute the singular…
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes…
Two new algorithms are described for matching two dimensional coordinate lists of point sources that are signifcantly faster than previous methods. By matching rarely occurring triangles (or more complex shapes) in the two lists, and by…
Timely and accurate detection of anomalies in power electronics is becoming increasingly critical for maintaining complex production systems. Robust and explainable strategies help decrease system downtime and preempt or mitigate…
The Fast Marching Method is a very popular algorithm to compute times-of-arrival maps (distances map measured in time units). Since their proposal in 1995, it has been applied to many different applications such as robotics, medical…
We present a new preprocessing algorithm for embedding the nodes of a given edge-weighted undirected graph into a Euclidean space. The Euclidean distance between any two nodes in this space approximates the length of the shortest path…
Non-parametric two-sample tests based on energy distance or maximum mean discrepancy are widely used statistical tests for comparing multivariate data from two populations. While these tests enjoy desirable statistical properties, their…
Approximate K Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. In AKNN search, distance computation is the core task that dominates the runtime. Existing approaches typically use approximate…
Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…