Related papers: Measuring Approximate Functional Dependencies: a C…
Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to…
Approximate functional dependencies (AFDs) relax exact functional dependencies by tolerating a bounded degree of violation, making them suited for data quality auditing. Threshold-based discovery returns all dependencies above a…
Functional dependencies (FDs) specify the intended data semantics while violations of FDs indicate deviation from these semantics. In this paper, we study a data cleaning problem in which the FDs may not be completely correct, e.g., due to…
The concept of matching dependencies (mds) is recently pro- posed for specifying matching rules for object identification. Similar to the functional dependencies (with conditions), mds can also be applied to various data quality…
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any…
Order dependencies (ODs) capture relationships between ordered domains of attributes. Approximate ODs (AODs) capture such relationships even when there exist exceptions in the data. During automated discovery of ODs, validation is the…
In real life, data are often of poor quality as a result, for instance, of uncertainty, mismeasurements, missing values or bad inputs. This issue hampers an implicit yet crucial operation of every database management system: equality…
A possible world of an incomplete database table is obtained by imputing values from the attributes (infinite) domain to the place of \texttt{NULL} s. A table satisfies a possible key or possible functional dependency constraint if there…
Differential dependencies (DDs) capture the relationships between data columns of relations. They are more general than functional dependencies (FDs) and and the difference is that DDs are defined on the distances between values of two…
Functional dependencies -- traditional, approximate and conditional are of critical importance in relational databases, as they inform us about the relationships between attributes. They are useful in schema normalization, data…
Measuring a strength of dependence of random variables is an important problem in statistical practice. In this paper, we propose a new function valued measure of dependence of two random variables. It allows one to study and visualize…
Usually, density functional models are considered approximations to density functional theory, However, there is no systematic connection between the two, and this can make us doubt about a linkage. This attitude can be further enforced by…
Time-dependent density functional theory continues to draw a large number of users in a wide range of fields exploring myriad applications involving electronic spectra and dynamics. Although in principle exact, the predictivity of the…
We take a different look at the problem of testing the independence of two metric-space-valued random variables using the distance correlation. Instead of testing if the distance correlation vanishes exactly, we are interested in the…
Learning about density functional approximations (DFAs), or approximations for the exchange-correlation functional, can be intimidating. Density Functional Theory is now one of the primary simulation tools for the practicing chemist or…
Two families of dependence measures between random variables are introduced. They are based on the R\'enyi divergence of order $\alpha$ and the relative $\alpha$-entropy, respectively, and both dependence measures reduce to Shannon's mutual…
We study the problem of discovering functional dependencies (FD) from a noisy dataset. We focus on FDs that correspond to statistical dependencies in a dataset and draw connections between FD discovery and structure learning in…
Multiple testing has been a popular topic in statistical research. Although vast works have been done, controlling the false discoveries remains a challenging task when the corresponding test statistics are dependent. Various methods have…
An increasing number of generative music models can be conditioned on an audio prompt that serves as musical context for which the model is to create an accompaniment (often further specified using a text prompt). Evaluation of how well…
Partial dependence curves (FPD) introduced by Friedman, are an important model interpretation tool, but are often not accessible to business analysts and scientists who typically lack the skills to choose, tune, and assess machine learning…