Related papers: PANDA: Query Evaluation in Submodular Width
Recent works on bounding the output size of a conjunctive query with functional dependencies and degree constraints have shown a deep connection between fundamental questions in information theory and database theory. We prove analogous…
Database theory is exciting because it studies highly general and practically useful abstractions. Conjunctive query (CQ) evaluation is a prime example: it simultaneously generalizes graph pattern matching, constraint satisfaction, and…
The submodular width is a complexity measure of conjunctive queries (CQs), which assigns a nonnegative real number, subw(Q), to each CQ Q. An existing algorithm, called PAND, performs CQ evaluation in polynomial time where the exponent is…
PANDA is a powerful generic algorithm for answering conjunctive queries (CQs) and disjunctive datalog rules (DDRs) given input degree constraints. In the special case where degree constraints are cardinality constraints and the query is…
Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of…
Cardinality estimation and conjunctive query evaluation are two of the most fundamental problems in database query processing. Recent work proposed, studied, and implemented a robust and practical information-theoretic cardinality…
Estimating the cardinality of the output of a query is a fundamental problem in database query processing. In this article, we overview a recently published contribution that casts the cardinality estimation problem as linear optimization…
Gene regulatory network reconstruction is a fundamental problem in computational biology. We recently developed an algorithm, called PANDA (Passing Attributes Between Networks for Data Assimilation), that integrates multiple sources of…
In this paper, we consider contention resolution algorithms that are augmented with predictions about the network. We begin by studying the natural setup in which the algorithm is provided a distribution defined over the possible network…
Accurately determining a change in protein binding affinity upon mutations is important for the discovery and design of novel therapeutics and to assist mutagenesis studies. Determination of change in binding affinity upon mutations…
Deriving formulations for computing and estimating tight worst-case size increases for conjunctive queries with various constraints has been at the core of theoretical database research. If the problem has no constraints or only one…
Information inequalities appear in many database applications such as query output size bounds, query containment, and implication between data dependencies. Recently Khamis et al. proposed to study the algorithmic aspects of information…
We consider the linear discriminant analysis problem in the high-dimensional settings. In this work, we propose PANDA(PivotAl liNear Discriminant Analysis), a tuning-insensitive method in the sense that it requires very little effort to…
Learned index structures aim to accelerate queries by training machine learning models to approximate the rank function associated with a database attribute. While effective in practice, their theoretical limitations are not fully…
Most research on data discovery has so far focused on improving individual discovery operators such as join, correlation, or union discovery. However, in practice, a combination of these techniques and their corresponding indexes may be…
We consider the problem of estimating the support size of a distribution $D$. Our investigations are pursued through the lens of distribution testing and seek to understand the power of conditional sampling (denoted as COND), wherein one is…
Upper and lower bounds are obtained for the joint entropy of a collection of random variables in terms of an arbitrary collection of subset joint entropies. These inequalities generalize Shannon's chain rule for entropy as well as…
Recent work has demonstrated the catastrophic effects of poor cardinality estimates on query processing time. In particular, underestimating query cardinality can result in overly optimistic query plans which take orders of magnitude longer…
We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q,…
We introduce new combinatorial quantities for concept classes, and prove lower and upper bounds for learning complexity in several models of query learning in terms of various combinatorial quantities. Our approach is flexible and powerful…