Related papers: CRUSH: fast and scalable data reduction for imagin…
The use of brain images as markers for diseases or behavioral differences is challenged by the small effects size and the ensuing lack of power, an issue that has incited researchers to rely more systematically on large cohorts. Coupled…
We investigate the problem of scanning and prediction ("scandiction", for short) of multidimensional data arrays. This problem arises in several aspects of image and video processing, such as predictive coding, for example, where an image…
Spectral clustering is one of the most prominent clustering approaches. The distance-based similarity is the most widely used method for spectral clustering. However, people have already noticed that this is not suitable for multi-scale…
We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…
Seismic noise with an amplitude higher than that of the sought signal is a challenge for detection. Several techniques have been developed to suppress the ambient noise and to reduce the detection threshold in order to find signals with the…
Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Clustering is the process of grouping similar…
In recent years, the distinctive advancement of handling huge data promotes the evolution of ubiquitous computing and analysis technologies. With the constantly upward system burden and computational complexity, adaptive coding has been a…
Real-world data is typically a noisy manifestation of a core pattern (schema), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. decomposing) the data into schema and noise. We introduce SCHENO, a…
In this paper, we propose a novel score-base generative model for unconditional raw audio synthesis. Our proposal builds upon the latest developments on diffusion process modeling with stochastic differential equations, which already…
Contrast enhancement is an important preprocessing technique for improving the performance of downstream tasks in image processing and computer vision. Among the existing approaches based on nonlinear histogram transformations, contrast…
Clustering analysis identifies samples as groups based on either their mutual closeness or homogeneity. In order to detect clusters in arbitrary shapes, a novel and generic solution based on boundary erosion is proposed. The clusters are…
Die-stacked DRAM has been proposed for use as a large, high-bandwidth, last-level cache with hundreds or thousands of megabytes of capacity. Not all workloads (or phases) can productively utilize this much cache space, however.…
Kriging or Gaussian Process Regression is applied in many fields as a non-linear regression model as well as a surrogate model in the field of evolutionary computation. However, the computational and space complexity of Kriging, that is…
Clustering algorithms are often used to find subpopulations in exploratory data analysis workflows. Not only the clusters themselves, but also their shape can represent meaningful subpopulations. In this paper, we present FLASC, an…
Detecting arbitrarily shaped clusters in high-dimensional noisy data is challenging for current clustering methods. We introduce SHADE (Structure-preserving High-dimensional Analysis with Density-based Exploration), the first deep…
Clustering is a fundamental data mining tool that aims to divide data into groups of similar items. Generally, intuition about clustering reflects the ideal case -- exact data sets endowed with flawless dissimilarity between individual…
Modern data science relies on data analytic pipelines to organize interdependent computational steps. Such analytic pipelines often involve different algorithms across multiple steps, each with its own hyperparameters. To achieve the best…
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes…
Investigation of the underlying physics or biology from empirical data requires a quantifiable notion of similarity - when do two observed data sets indicate nearly identical generating processes, and when they do not. The discriminating…
Convolution and cross-correlation are the basis of filtering and pattern or template matching in multimedia signal processing. We propose two throughput scaling options for any one-dimensional convolution kernel in programmable processors…