Related papers: Making massive probabilistic databases practical
Probabilistic databases (PDBs) model uncertainty in data. The current standard is to view PDBs as finite probability spaces over relational database instances. Since many attributes in typical databases have infinite domains, such as…
We study the complexity of evaluating queries on probabilistic databases under bag semantics. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the…
Probabilistic databases (PDBs) are probability spaces over database instances. They provide a framework for handling uncertainty in databases, as occurs due to data integration, noisy data, data from unreliable sources or randomized…
Queries with aggregation and arithmetic operations, as well as incomplete data, are common in real-world database, but we lack a good understanding of how they should interact. On the one hand, systems based on SQL provide ad-hoc rules for…
Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently…
Probabilistic databases (PDBs) introduce uncertainty into relational databases by specifying probabilities for several possible instances. Traditionally, they are finite probability spaces over database instances. Such finite PDBs…
A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the…
In this work, we study the problem of computing a tuple's expected multiplicity over probabilistic databases with bag semantics (where each tuple is associated with a multiplicity) exactly and approximately. We consider bag-TIDBs where we…
As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information…
In this paper, given a user's query set and budget, we aim to use the limited budget to help users assemble a set of datasets that can enrich a base dataset by introducing the maximum number of distinct tuples (i.e., maximizing…
This paper discusses a method for implementing a probabilistic inference system based on an extended relational data model. This model provides a unified approach for a variety of applications such as dynamic programming, solving sparse…
Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this…
Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can…
We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in…
The validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic…
Query evaluation in tuple-independent probabilistic databases is the problem of computing the probability of an answer to a query given independent probabilities of the individual tuples in a database instance. There are two main approaches…
Statistical models of real world data typically involve continuous probability distributions such as normal, Laplace, or exponential distributions. Such distributions are supported by many probabilistic modelling formalisms, including…
Top-$k$ queries allow end-users to focus on the most important (top-$k$) answers amongst those which satisfy the query. In traditional databases, a user defined score function assigns a score value to each tuple and a top-$k$ query returns…
The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, we present…
Functional dependencies -- traditional, approximate and conditional are of critical importance in relational databases, as they inform us about the relationships between attributes. They are useful in schema normalization, data…