Related papers: Discovering Multi-Table Functional Dependencies Wi…
The automatic discovery of functional dependencies(FDs) has been widely studied as one of the hardest problems in data profiling. Existing approaches have focused on making the FD computation efficient while inspecting single relations at a…
Functional dependencies (FDs) are fundamental integrity constraints in relational databases, but discovering them under incremental updates remains challenging. While static algorithms are inefficient due to full re-execution, incremental…
Functional dependencies (FDs) are basic constraints in relational databases and are used for many data management tasks. Most FD discovery algorithms find all valid dependencies, but this causes two problems. First, the computational cost…
Functional and inclusion dependencies are the most widely used classes of data dependencies in data profiling due to their ability to identify relationships in data such as primary and foreign keys. These relationships are equally important…
We study the problem of discovering functional dependencies (FD) from a noisy dataset. We focus on FDs that correspond to statistical dependencies in a dataset and draw connections between FD discovery and structure learning in…
We study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the schemata and data…
We study the problem of discovering joinable datasets at scale. This is, how to automatically discover pairs of attributes in a massive collection of independent, heterogeneous datasets that can be joined. Exact (e.g., based on distinct…
In this work, we present an adjoint-based method for discovering the underlying governing partial differential equations (PDEs) given data. The idea is to consider a parameterized PDE in a general form and formulate a PDE-constrained…
For decades, the join operator over fast data streams has always drawn much attention from the database community, due to its wide spectrum of real-world applications, such as online clustering, intrusion detection, sensor data monitoring,…
Multi-column dependencies in relational databases come associated with two different computational tasks. The detection problem is to decide whether a dependency of a certain type and size holds in a given database, the discovery problem…
In the current paper, we propose to fuse together stored data (tables) and their functional dependencies (FDs) inside a DBMS. We aim to make FDs first-class citizens: objects which can be queried and used to query data. Our idea is to allow…
Differential dependencies (DDs) capture the relationships between data columns of relations. They are more general than functional dependencies (FDs) and and the difference is that DDs are defined on the distances between values of two…
Missing and incorrect values often cause serious consequences. To deal with these data quality problems, a class of common employed tools are dependency rules, such as Functional Dependencies (FDs), Conditional Functional Dependencies…
Relational data augmentation is a powerful technique for enhancing data analytics and improving machine learning models by incorporating columns from external datasets. However, it is challenging to efficiently discover relevant external…
This work is motivated by two key facts. First, it is highly desirable to be able to learn and perform knowledge discovery and analytics (LKD) tasks without the need to access raw-data tables. This may be due to organizations finding it…
We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations…
Discovering which tables in large, heterogeneous repositories can be joined and by what transformations is a central challenge in data integration and data discovery. Traditional join discovery methods are largely designed for equi-joins,…
The concept of matching dependencies (mds) is recently pro- posed for specifying matching rules for object identification. Similar to the functional dependencies (with conditions), mds can also be applied to various data quality…
Large Language Models (LLMs) are being increasingly used within data systems to process large datasets with text fields. A broad class of such tasks involves a semantic join-joining two tables based on a natural language predicate per pair…
A central challenge in analyzing multivariate interactions within complex systems is to decompose how multiple inputs jointly determine an output. Existing approaches generally operate on observed probability distributions and can conflate…