Related papers: Histograms and Wavelets on Probabilistic Data

Building Wavelet Histograms on Large Data in MapReduce

MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential.…

Databases · Computer Science 2011-11-01 Jeffrey Jestes , Ke Yi , Feifei Li

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

How far will you walk to find your shortcut: Space Efficient Synopsis Construction Algorithms

In this paper we consider the wavelet synopsis construction problem without the restriction that we only choose a subset of coefficients of the original data. We provide the first near optimal algorithm. We arrive at the above algorithm by…

Data Structures and Algorithms · Computer Science 2009-09-29 Sudipto Guha

Decision-Making with Complex Data Structures using Probabilistic Programming

Existing decision-theoretic reasoning frameworks such as decision networks use simple data structures and processes. However, decisions are often made based on complex data structures, such as social networks and protein sequences, and rich…

Artificial Intelligence · Computer Science 2014-07-14 Brian E. Ruttenberg , Avi Pfeffer

Optimal Data-Based Binning for Histograms

Histograms are convenient non-parametric density estimators, which continue to be used ubiquitously. Summary quantities estimated from histogram-based probability density models depend on the choice of the number of bins. We introduce a…

Data Analysis, Statistics and Probability · Physics 2013-09-17 Kevin H. Knuth

Prioritized Data Compression using Wavelets

The volume of data and the velocity with which it is being generated by com- putational experiments on high performance computing (HPC) systems is quickly outpacing our ability to effectively store this information in its full fidelity.…

Computation · Statistics 2014-07-14 Henry Scharf , Ryan Elmore , Kenny Gruchalla

Wavelet-Based Density Estimation for Persistent Homology

Persistent homology is a central methodology in topological data analysis that has been successfully implemented in many fields and is becoming increasingly popular and relevant. The output of persistent homology is a persistence diagram --…

Statistics Theory · Mathematics 2024-04-24 Konstantin Häberle , Barbara Bravi , Anthea Monod

High-Dimensional Probability Estimation with Deep Density Models

One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations…

Machine Learning · Statistics 2013-02-22 Oren Rippel , Ryan Prescott Adams

Managing large-scale scientific hypotheses as uncertain and probabilistic data

In view of the paradigm shift that makes science ever more data-driven, in this thesis we propose a synthesis method for encoding and managing large-scale deterministic scientific hypotheses as uncertain and probabilistic data. In the form…

Databases · Computer Science 2015-02-13 Bernardo Gonçalves

Towards Privacy-Preserving Relational Data Synthesis via Probabilistic Relational Models

Probabilistic relational models provide a well-established formalism to combine first-order logic and probabilistic models, thereby allowing to represent relationships between objects in a relational domain. At the same time, the field of…

Artificial Intelligence · Computer Science 2024-10-03 Malte Luttermann , Ralf Möller , Mattis Hartwig

Histogram Arithmetic under Uncertainty of Probability Density Function

In this article we propose a method of performing arithmetic operations on varia-bles with unknown distribution. The approach to the evaluation results of arithme-tic operations can select probability intervals of the algebraic equations…

Numerical Analysis · Computer Science 2015-12-11 V. N. Petrushin , E. V. Nikulchev , D. A. Korolev

Complexity of Possibly-gapped Histogram and Analysis of Histogram (ANOHT)

Without unrealistic continuity and smoothness assumptions on a distributional density of one dimensional dataset, constructing an authentic possibly-gapped histogram becomes rather complex. The candidate ensemble is described via a…

Methodology · Statistics 2017-11-15 Fushing Hsieh , Tania Roy

Resolving Histogram Binning Dilemmas with Binless and Binfull Algorithms

The histogram is an analysis tool in widespread use within many sciences, with high energy physics as a prime example. However, there exists an inherent bias in the choice of binning for the histogram, with different choices potentially…

Data Analysis, Statistics and Probability · Physics 2014-05-21 Abram Krislock , Nathan Krislock

Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach providing this description in the form of a topography of…

Machine Learning · Statistics 2021-03-02 Maria d'Errico , Elena Facco , Alessandro Laio , Alex Rodriguez

On Constrained Open-World Probabilistic Databases

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently…

Artificial Intelligence · Computer Science 2019-04-04 Tal Friedman , Guy Van den Broeck

Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching…

Artificial Intelligence · Computer Science 2017-04-05 Feras Saad , Leonardo Casarsa , Vikash Mansinghka

Probabilistic Answer Set Programming with Discrete and Continuous Random Variables

Probabilistic Answer Set Programming under the credal semantics (PASP) extends Answer Set Programming with probabilistic facts that represent uncertain information. The probabilistic facts are discrete with Bernoulli distributions. However,…

Artificial Intelligence · Computer Science 2025-02-19 Damiano Azzolini , Fabrizio Riguzzi

Composite likelihood methods for histogram-valued random variables

Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions -- such as random rectangles or histograms -- each describing a portion of the…

Computation · Statistics 2020-03-23 Thomas Whitaker , Boris Beranger , Scott A. Sisson

Making massive probabilistic databases practical

Existence of incomplete and imprecise data has moved the database paradigm from deterministic to proba- babilistic information. Probabilistic databases contain tuples that may or may not exist with some probability. As a result, the number…

Databases · Computer Science 2013-07-04 Andrei Todor , Alin Dobra , Tamer Kahveci , Christopher Dudley

HBIC: A Biclustering Algorithm for Heterogeneous Datasets

Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining…

Machine Learning · Computer Science 2024-08-26 Adán José-García , Julie Jacques , Clément Chauvet , Vincent Sobanski , Clarisse Dhaenens