Related papers: Approximate Query Processing via Tuple Bubbles
Due to the distribution of linked data across the web, the methods that process federated queries through a distributed approach are more attractive to the users and have gained more prosperity. In distributed processing of federated…
We present EntropyDB, an interactive data exploration system that uses a probabilistic approach to generate a small, query-able summary of a dataset. Departing from traditional summarization techniques, we use the Principle of Maximum…
When query evaluation produces too many tuples, a new approach in query answering is to retrieve a diverse subset of them. The standard approach for measuring the diversity of a set of tuples is to use a distance function between tuples,…
Many machine learning tasks such as clustering, classification, and dataset search benefit from embedding data points in a space where distances reflect notions of relative similarity as perceived by humans. A common way to construct such…
Query tractability has been traditionally defined as a function of input database and query sizes, or of both input and output sizes, where the query result is represented as a bag of tuples. In this report, we introduce a framework that…
Many new database application domains such as experimental sciences and medicine are characterized by large sequences as their main form of data. Using approximate representation can significantly reduce the required storage and search…
As more and more organizations rely on data-driven decision making, large-scale analytics become increasingly important. However, an analyst is often stuck waiting for an exact result. As such, organizations turn to Cloud providers that…
Database flexible querying is an alternative to the classic one for users. The use of Formal Concepts Analysis (FCA) makes it possible to make approximate answers that those turned over by a classic DataBase Management System (DBMS). Some…
Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple…
In this paper, we introduce a novel approach to computing the contribution of input tuples to the result of the query, quantified by the Banzhaf and Shapley values. In contrast to prior algorithmic work that focuses on…
One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the…
A package query returns a package - a multiset of tuples - that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of…
Comparison to traditionally accurate computing, approximate computing focuses on the rapidity of the satisfactory solution, but not the unnecessary accuracy of the solution. Approximate bisimularity is the approximate one corresponding to…
The Group-By query is an important kind of query, which is common and widely used in data warehouses, data analytics, and data visualization. Approximate query processing is an effective way to increase the querying efficiency on big data.…
Approximate Bayesian Computation (ABC) methods are applicable to statistical models specified by generative processes with analytically intractable likelihoods. These methods try to approximate the posterior density of a model parameter by…
Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using…
Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing…
Representing complex shapes with simple primitives in high accuracy is important for a variety of applications in computer graphics and geometry processing. Existing solutions may produce suboptimal samples or are complex to implement. We…
Among the paradigms for parallel and distributed computing, the one popularized with Linda, and based on tuple spaces, is one of the least used, despite the fact of being intuitive, easy to understand and to use. A tuple space is a…
This paper describes a class of probabilistic approximation algorithms based on bucket elimination which offer adjustable levels of accuracy and efficiency. We analyze the approximation for several tasks: finding the most probable…