Related papers: Efficient l_{alpha} Distance Approximation for Hig…

Computationally Efficient Estimators for Dimension Reductions Using Stable Random Projections

The method of stable random projections is a tool for efficiently computing the $l_\alpha$ distances using low memory, where $0<\alpha \leq 2$ is a tuning parameter. The method boils down to a statistical estimation task and various…

Machine Learning · Computer Science 2008-12-18 Ping Li

Approximating Higher-Order Distances Using Random Projections

We provide a simple method and relevant theoretical analysis for efficiently estimating higher-order lp distances. While the analysis mainly focuses on l4, our methodology extends naturally to p = 6,8,10..., (i.e., when p is even).…

Machine Learning · Computer Science 2012-03-19 Ping Li , Michael W. Mahoney , Yiyuan She

Optimal Projections in the Distance-Based Statistical Methods

This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project…

Computation · Statistics 2019-11-11 Chuanping Yu , Xiaoming Huo

On Approximating the Lp Distances for p>2

Applications in machine learning and data mining require computing pairwise Lp distances in a data matrix A. For massive high-dimensional data, computing all pairwise distances of A can be infeasible. In fact, even storing A or all pairwise…

Machine Learning · Computer Science 2008-12-18 Ping Li

Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning

We design efficient distance approximation algorithms for several classes of structured high-dimensional distributions. Specifically, we show algorithms for the following problems: - Given sample access to two Bayesian networks $P_1$ and…

Data Structures and Algorithms · Computer Science 2020-02-17 Arnab Bhattacharyya , Sutanu Gayen , Kuldeep S. Meel , N. V. Vinodchandran

Random Projection Estimation of Discrete-Choice Models with Large Choice Sets

We introduce sparse random projection, an important dimension-reduction tool from machine learning, for the estimation of discrete-choice models with high-dimensional choice sets. Initially, high-dimensional data are compressed into a…

Machine Learning · Statistics 2016-04-21 Khai X. Chiong , Matthew Shum

Computationally Efficient Learning of Statistical Manifolds

Analyzing high-dimensional data with manifold learning algorithms often requires searching for the nearest neighbors of all observations. This presents a computational bottleneck in statistical manifold learning when observations of…

Machine Learning · Computer Science 2022-03-11 Fan Cheng , Anastasios Panagiotelis , Rob J Hyndman

Distance Queries from Sampled Data: Accurate and Efficient

Distance queries are a basic tool in data analysis. They are used for detection and localization of change for the purpose of anomaly detection, monitoring, or planning. Distance queries are particularly useful when data sets such as…

Data Structures and Algorithms · Computer Science 2015-03-20 Edith Cohen

Dimension reduction, exact recovery, and error estimates for sparse reconstruction in phase space

An important theme in modern inverse problems is the reconstruction of time-dependent data from only finitely many measurements. To obtain satisfactory reconstruction results in this setting it is essential to strongly exploit temporal…

Numerical Analysis · Mathematics 2024-03-14 Martin Holler , Alexander Schlüter , Benedikt Wirth

Proximal Projection Method for Stable Linearly Constrained Optimization

Many applications using large datasets require efficient methods for minimizing a proximable convex function subject to satisfying a set of linear constraints within a specified tolerance. For this task, we present a proximal projection…

Optimization and Control · Mathematics 2024-12-10 Howard Heaton

Distance statistics in random media: high dimension and/or high neighborhood order cases

Consider an unlimited homogeneous medium disturbed by points generated via Poisson process. The neighborhood of a point plays an important role in spatial statistics problems. Here, we obtain analytically the distance statistics to $k$th…

Statistical Mechanics · Physics 2015-08-11 Cristiano Roberto Fabri Granzotti , Alexandre Souto Martinez

A multi-resolution approximation via linear projection for large spatial datasets

Recent technical advances in collecting spatial data have been increasing the demand for methods to analyze large spatial datasets. The statistical analysis for these types of datasets can provide useful knowledge in various fields.…

Methodology · Statistics 2021-06-16 Toshihiro Hirano

Learned k-NN Distance Estimation

Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life…

Databases · Computer Science 2022-11-29 Daichi Amagata , Yusuke Arai , Sumio Fujita , Takahiro Hara

A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data

Random projection is widely used as a method of dimension reduction. In recent years, its combination with standard techniques of regression and classification has been explored. Here we examine its use with principal component analysis…

Methodology · Statistics 2012-04-13 Qi Ding , Eric D. Kolaczyk

Random Projections For Large-Scale Regression

Fitting linear regression models can be computationally very expensive in large-scale data analysis tasks if the sample size and the number of variables are very large. Random projections are extensively used as a dimension reduction tool…

Statistics Theory · Mathematics 2017-01-20 Gian-Andrea Thanei , Christina Heinze , Nicolai Meinshausen

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of…

Machine Learning · Statistics 2016-08-04 Christina Heinze , Brian McWilliams , Nicolai Meinshausen

Generalization Bound and Learning Methods for Data-Driven Projections in Linear Programming

How to solve high-dimensional linear programs (LPs) efficiently is a fundamental question. Recently, there has been a surge of interest in reducing LP sizes using random projections, which can accelerate solving LPs independently of…

Machine Learning · Computer Science 2024-05-22 Shinsaku Sakaue , Taihei Oki

Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance…

Machine Learning · Statistics 2014-05-26 Michail Vlachos , Nikolaos Freris , Anastasios Kyrillidis

Recovering the Optimal Solution by Dual Random Projection

Random projection has been widely used in data classification. It maps high-dimensional data into a low-dimensional subspace in order to reduce the computational cost in solving the related optimization problem. While previous studies are…

Machine Learning · Computer Science 2014-02-24 Lijun Zhang , Mehrdad Mahdavi , Rong Jin , Tianbao Yang , Shenghuo Zhu

Towards Making High Dimensional Distance Metric Learning Practical

In this work, we study distance metric learning (DML) for high dimensional data. A typical approach for DML with high dimensional data is to perform the dimensionality reduction first before learning the distance metric. The main…

Machine Learning · Computer Science 2015-09-16 Qi Qian , Rong Jin , Lijun Zhang , Shenghuo Zhu