Related papers: DataJoint: A Simpler Relational Data Model
Operational rigor determines whether human-agent collaboration succeeds or fails. Scientific data pipelines need the equivalent of DevOps -- SciOps -- yet common approaches fragment provenance across disconnected systems without…
Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods…
Scientific research relies on well-structured, standardized data; however, much of it is stored in formats such as free-text lab notebooks, non-standardized spreadsheets, or data repositories. This lack of structure challenges…
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding. Low-dimensional embeddings aim to encapsulate a concise vector representation for an underlying dataset…
Entity matching is a critical challenge in data integration and cleaning, central to tasks like fuzzy joins and deduplication. Traditional approaches have focused on overcoming fuzzy term representations through methods such as edit…
This paper introduces RG (Relational Genetic) model, a revised relational model to represent graph-structured data in RDBMS while preserving its topology, for efficiently and effectively extracting data in different formats from disparate…
Data modeling is a process of developing a model to design and develop a data system that supports an organization s various business processes. A conceptual data model represents a technology-independent specification of structure of data…
Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple…
Relation Extraction is an important task in Information Extraction which deals with identifying semantic relations between entity mentions. Traditionally, relation extraction is carried out after entity extraction in a "pipeline" fashion,…
Discovering which tables in large, heterogeneous repositories can be joined and by what transformations is a central challenge in data integration and data discovery. Traditional join discovery methods are largely designed for equi-joins,…
The database community lacks a unified relational query language for subset selection and optimisation queries, limiting both user expression and query optimiser reasoning about such problems. Decades of research (latterly under the rubric…
Since Chen's Entity-Relationship (ER) model, conceptual modeling has been playing a fundamental role in relational data design. In this paper we consider an extended ER (EER) model enriched with cardinality constraints, disjointness…
Data-driven analysis is important in virtually every modern organization. Yet, most data is underutilized because it remains locked in silos inside of organizations; large organizations have thousands of databases, and billions of files…
Without any doubt, the relational paradigm has been a huge success. At the same time, we believe that the time is ripe to rethink how database systems could look like if we designed them from scratch. Would we really end up with the same…
Joint entity and relation extraction is a process that identifies entity pairs and their relations using a single model. We focus on the problem of joint extraction in distantly-labeled data, whose labels are generated by aligning entity…
Although researchers have devoted considerable attention to helping database users formulate queries, many users still find it challenging to specify queries that involve joining tables. To help users construct join queries for exploring…
Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for…
Entity Linking involves detecting and linking entity mentions in natural language texts to a knowledge graph. Traditional methods use a two-step process with separate models for entity recognition and disambiguation, which can be…
Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social…
Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods…