English
Related papers

Related papers: Histograms and Wavelets on Probabilistic Data

200 papers

MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential.…

Databases · Computer Science 2011-11-01 Jeffrey Jestes , Ke Yi , Feifei Li

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

In this paper we consider the wavelet synopsis construction problem without the restriction that we only choose a subset of coefficients of the original data. We provide the first near optimal algorithm. We arrive at the above algorithm by…

Data Structures and Algorithms · Computer Science 2009-09-29 Sudipto Guha

Existing decision-theoretic reasoning frameworks such as decision networks use simple data structures and processes. However, decisions are often made based on complex data structures, such as social networks and protein sequences, and rich…

Artificial Intelligence · Computer Science 2014-07-14 Brian E. Ruttenberg , Avi Pfeffer

Histograms are convenient non-parametric density estimators, which continue to be used ubiquitously. Summary quantities estimated from histogram-based probability density models depend on the choice of the number of bins. We introduce a…

Data Analysis, Statistics and Probability · Physics 2013-09-17 Kevin H. Knuth

The volume of data and the velocity with which it is being generated by com- putational experiments on high performance computing (HPC) systems is quickly outpacing our ability to effectively store this information in its full fidelity.…

Computation · Statistics 2014-07-14 Henry Scharf , Ryan Elmore , Kenny Gruchalla

Persistent homology is a central methodology in topological data analysis that has been successfully implemented in many fields and is becoming increasingly popular and relevant. The output of persistent homology is a persistence diagram --…

Statistics Theory · Mathematics 2024-04-24 Konstantin Häberle , Barbara Bravi , Anthea Monod

One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations…

Machine Learning · Statistics 2013-02-22 Oren Rippel , Ryan Prescott Adams

In view of the paradigm shift that makes science ever more data-driven, in this thesis we propose a synthesis method for encoding and managing large-scale deterministic scientific hypotheses as uncertain and probabilistic data. In the form…

Databases · Computer Science 2015-02-13 Bernardo Gonçalves

Probabilistic relational models provide a well-established formalism to combine first-order logic and probabilistic models, thereby allowing to represent relationships between objects in a relational domain. At the same time, the field of…

Artificial Intelligence · Computer Science 2024-10-03 Malte Luttermann , Ralf Möller , Mattis Hartwig

In this article we propose a method of performing arithmetic operations on varia-bles with unknown distribution. The approach to the evaluation results of arithme-tic operations can select probability intervals of the algebraic equations…

Numerical Analysis · Computer Science 2015-12-11 V. N. Petrushin , E. V. Nikulchev , D. A. Korolev

Without unrealistic continuity and smoothness assumptions on a distributional density of one dimensional dataset, constructing an authentic possibly-gapped histogram becomes rather complex. The candidate ensemble is described via a…

Methodology · Statistics 2017-11-15 Fushing Hsieh , Tania Roy

The histogram is an analysis tool in widespread use within many sciences, with high energy physics as a prime example. However, there exists an inherent bias in the choice of binning for the histogram, with different choices potentially…

Data Analysis, Statistics and Probability · Physics 2014-05-21 Abram Krislock , Nathan Krislock

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach providing this description in the form of a topography of…

Machine Learning · Statistics 2021-03-02 Maria d'Errico , Elena Facco , Alessandro Laio , Alex Rodriguez

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently…

Artificial Intelligence · Computer Science 2019-04-04 Tal Friedman , Guy Van den Broeck

Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching…

Artificial Intelligence · Computer Science 2017-04-05 Feras Saad , Leonardo Casarsa , Vikash Mansinghka

Probabilistic Answer Set Programming under the credal semantics (PASP) extends Answer Set Programming with probabilistic facts that represent uncertain information. The probabilistic facts are discrete with Bernoulli distributions. However,…

Artificial Intelligence · Computer Science 2025-02-19 Damiano Azzolini , Fabrizio Riguzzi

Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions -- such as random rectangles or histograms -- each describing a portion of the…

Computation · Statistics 2020-03-23 Thomas Whitaker , Boris Beranger , Scott A. Sisson

Existence of incomplete and imprecise data has moved the database paradigm from deterministic to proba- babilistic information. Probabilistic databases contain tuples that may or may not exist with some probability. As a result, the number…

Databases · Computer Science 2013-07-04 Andrei Todor , Alin Dobra , Tamer Kahveci , Christopher Dudley

Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining…

Machine Learning · Computer Science 2024-08-26 Adán José-García , Julie Jacques , Clément Chauvet , Vincent Sobanski , Clarisse Dhaenens
‹ Prev 1 2 3 10 Next ›