Related papers: Spatial Random Sampling: A Structure-Preserving Da…

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data.…

Machine Learning · Statistics 2018-02-08 Panagiotis A. Traganitis , Georgios B. Giannakis

Fast Selection of Spatially Balanced Samples

Sampling from very large spatial populations is challenging. The solutions suggested in recent literature on this subject often require that the randomly selected units are well distributed across the study region by using complex…

Methodology · Statistics 2017-10-26 Roberto Benedetti , Federica Piersimoni

Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for…

Human-Computer Interaction · Computer Science 2017-09-14 Yan Zheng , Yi Ou , Alexander Lex , Jeff M. Phillips

Spatially Consistent Representation Learning

Self-supervised learning has been widely used to obtain transferrable representations from unlabeled images. Especially, recent contrastive learning methods have shown impressive performances on downstream image classification tasks. While…

Computer Vision and Pattern Recognition · Computer Science 2021-04-29 Byungseok Roh , Wuhyun Shin , Ildoo Kim , Sungwoong Kim

CPU Simulation with Ranked Set Sampling and Repeated Subsampling

Computer system simulation studies routinely rely on executing a limited number of short application regions, since full end-to-end simulation is prohibitively time-consuming. To preserve representativeness, existing methods employ either…

Hardware Architecture · Computer Science 2026-03-25 Magnus Ekman

Uniform Sampling for Matrix Approximation

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time…

Data Structures and Algorithms · Computer Science 2014-08-22 Michael B. Cohen , Yin Tat Lee , Cameron Musco , Christopher Musco , Richard Peng , Aaron Sidford

Approximating Spectral Clustering via Sampling: a Review

Spectral clustering refers to a family of unsupervised learning algorithms that compute a spectral embedding of the original data based on the eigenvectors of a similarity graph. This non-linear transformation of the data is both the key of…

Machine Learning · Computer Science 2019-01-30 Nicolas Tremblay , Andreas Loukas

Scalable and Robust Community Detection with Randomized Sketching

This article explores and analyzes the unsupervised clustering of large partially observed graphs. We propose a scalable and provable randomized framework for clustering graphs generated from the stochastic block model. The clustering is…

Social and Information Networks · Computer Science 2022-12-06 Mostafa Rahmani , Andre Beckus , Adel Karimian , George Atia

Intelligent n-Means Spatial Sampling

Well-spread samples are desirable in many disciplines because they improve estimation when target variables exhibit spatial structure. This paper introduces an integrated methodological framework for spreading samples over the population's…

Methodology · Statistics 2025-10-29 Bardia Panahbehagh , Mehdi Mohebbi , Amir Mohammad HosseiniNasab

Clustering to Reduce Spatial Data Set Size

Traditionally it had been a problem that researchers did not have access to enough spatial data to answer pressing research questions or build compelling visualizations. Today, however, the problem is often that we have too much data.…

Machine Learning · Computer Science 2018-03-23 Geoff Boeing

generalRSS: Sampling and Inference for Balanced and Unbalanced Ranked Set Sampling in R

Ranked set sampling (RSS) is a stratified sampling method that improves efficiency over simple random sampling (SRS) by utilizing auxiliary information for ranking and stratification. While balanced RSS (BRSS) assumes equal allocation…

Methodology · Statistics 2025-09-03 Chul Moon , Soohyun Ahn

Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

The nearest prototype classification is a less computationally intensive replacement for the $k$-NN method, especially when large datasets are considered. In metric spaces, centroids are often used as prototypes to represent whole clusters.…

Machine Learning · Computer Science 2021-07-06 Jaroslav Hlaváč , Martin Kopp , Jan Kohout

A fast and Accurate Similarity-constrained Subspace Clustering Framework for Unsupervised Hyperspectral Image Classification

Accurate land cover segmentation of spectral images is challenging and has drawn widespread attention in remote sensing due to its inherent complexity. Although significant efforts have been made for developing a variety of methods, most of…

Image and Video Processing · Electrical Eng. & Systems 2021-11-30 Carlos Hinojosa , Esteban Vera , Henry Arguello

Soft Random Sampling: A Theoretical and Empirical Analysis

Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data. SRS selects a subset uniformly at random with replacement from the full data set in…

Machine Learning · Computer Science 2023-11-27 Xiaodong Cui , Ashish Mittal , Songtao Lu , Wei Zhang , George Saon , Brian Kingsbury

RULLS: Randomized Union of Locally Linear Subspaces for Feature Engineering

Feature engineering plays an important role in the success of a machine learning model. Most of the effort in training a model goes into data preparation and choosing the right representation. In this paper, we propose a robust feature…

Machine Learning · Computer Science 2018-04-27 Namita Lokare , Jorge Silva , Ilknur Kaynar Kabul

Scalable Spectral Clustering Using Random Binning Features

Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity…

Machine Learning · Computer Science 2019-11-26 Lingfei Wu , Pin-Yu Chen , Ian En-Hsu Yen , Fangli Xu , Yinglong Xia , Charu Aggarwal

Revealing spatial variability structures of geostatistical functional data via Dynamic Clustering

In several environmental applications data are functions of time, essentially con- tinuous, observed and recorded discretely, and spatially correlated. Most of the methods for analyzing such data are extensions of spatial statistical tools…

Methodology · Statistics 2011-06-28 Elvira Romano , Antonio Balzanella , Rosanna Verde

Sample-Cluster-Select: A new framework to obtain diverse approximate solutions of combinatorial optimization problems

When solving real-world problems, practitioners often hesitate to implement solutions obtained from mathematical models, especially for important decisions. This hesitation stems from practitioners' lack of trust in optimization models and…

Optimization and Control · Mathematics 2025-07-01 Susumu Hashimoto , Takeaki Uno

Consistent Bayesian Spatial Domain Partitioning Using Predictive Spanning Tree Methods

Bayesian model-based spatial clustering methods are widely used for their flexibility in estimating latent clusters with an unknown number of clusters while accounting for spatial proximity. Many existing methods are designed for clustering…

Methodology · Statistics 2025-08-13 Kun Huang , Huiyan Sang

StruClus: Structural Clustering of Large-Scale Graph Databases

We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…

Databases · Computer Science 2016-10-03 Till Schäfer , Petra Mutzel