Related papers: Improving Schema Matching with Linked Data

Linking Datasets on Organizations Using Half A Billion Open-Collaborated Records

Scholars studying organizations often work with multiple datasets lacking shared identifiers or covariates. In such situations, researchers usually use approximate string ("fuzzy") matching methods to combine datasets. String matching,…

Social and Information Networks · Computer Science 2025-09-24 Brian Libgober , Connor T. Jerzak

Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models

Current network training paradigms primarily focus on either centralized or decentralized data regimes. However, in practice, data availability often exhibits a hybrid nature, where both regimes coexist. This hybrid setting presents new…

Machine Learning · Computer Science 2025-12-01 Junyi Zhu , Ruicong Yao , Taha Ceritli , Savas Ozkan , Matthew B. Blaschko , Eunchung Noh , Jeongwon Min , Cho Jung Min , Mete Ozay

Leveraging Schema Labels to Enhance Dataset Search

A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior…

Information Retrieval · Computer Science 2020-01-29 Zhiyu Chen , Haiyan Jia , Jeff Heflin , Brian D. Davison

Linked Data Integration with Conflicts

Linked Data have emerged as a successful publication format and one of its main strengths is its fitness for integration of data from multiple sources. This gives them a great potential both for semantic applications and the enterprise…

Databases · Computer Science 2014-10-30 Jan Michelfeit , Tomáš Knap , Martin Nečaský

Matching Table Metadata with Business Glossaries Using Large Language Models

Enterprises often own large collections of structured data in the form of large databases or an enterprise data lake. Such data collections come with limited metadata and strict access policies that could limit access to the data contents…

Information Retrieval · Computer Science 2023-09-22 Elita Lobo , Oktie Hassanzadeh , Nhan Pham , Nandana Mihindukulasooriya , Dharmashankar Subramanian , Horst Samulowitz

Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets

Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and heterogeneous, often containing incomplete fields and erroneous values. One common way of cleaning metadata is to use a bibliographic reference…

Digital Libraries · Computer Science 2019-06-21 Athar Sefid , Jian Wu , Allen C. Ge , Jing Zhao , Lu Liu , Cornelia Caragea , Prasenjit Mitra , C. Lee Giles

Contextual Graph Embeddings: Accounting for Data Characteristics in Heterogeneous Data Integration

As organizations continue to access diverse datasets, the demand for effective data integration has increased. Key tasks in this process, such as schema matching and entity resolution, are essential but often require significant effort.…

Databases · Computer Science 2025-11-13 Yuka Haruki , Shigeru Ishikura , Kazuya Demachi , Teruaki Hayashi

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Many data we collect today are in tabular form, with rows as records and columns as attributes associated with each record. Understanding the structural relationship in tabular data can greatly facilitate the data science process.…

Data Structures and Algorithms · Computer Science 2020-09-09 Jin Cao , Yibo Zhao , Linjun Zhang , Jason Li

Enabling Smart Data: Noise filtering in Big Data classification

In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same…

Databases · Computer Science 2017-07-31 Diego García-Gil , Julián Luengo , Salvador García , Francisco Herrera

A Multi-Agent System for Semantic Mapping of Relational Data to Knowledge Graphs

Enterprises often maintain multiple databases for storing critical business data in siloed systems, resulting in inefficiencies and challenges with data interoperability. A key to overcoming these challenges lies in integrating disparate…

Databases · Computer Science 2025-11-11 Milena Trajanoska , Riste Stojanov , Dimitar Trajanov

COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics

Data analysis often involves comparing subsets of data across many dimensions for finding unusual trends and patterns. While the comparison between subsets of data can be expressed using SQL, they tend to be complex to write, and suffer…

Databases · Computer Science 2021-07-28 Tarique Siddiqui , Surajit Chaudhuri , Vivek Narasayya

A Graph Representation of Semi-structured Data for Web Question Answering

The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and…

Computation and Language · Computer Science 2020-10-15 Xingyao Zhang , Linjun Shou , Jian Pei , Ming Gong , Lijie Wen , Daxin Jiang

Combining Structured Corporate Data and Document Content to Improve Expertise Finding

In this paper, we present an algorithm for automatically building expertise evidence for finding experts within an organization by combining structured corporate information with different content. We also describe our test data collection…

Information Retrieval · Computer Science 2007-05-23 Alistair McLean , Mingfang Wu , Anne-Marie Vercoustre

Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance

Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods…

Artificial Intelligence · Computer Science 2025-11-25 Xixi Wang , Miguel Costa , Jordanka Kovaceva , Shuai Wang , Francisco C. Pereira

Leveraging Data Recasting to Enhance Tabular Reasoning

Creating challenging tabular inference data is essential for learning complex reasoning. Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is…

Computation and Language · Computer Science 2022-11-24 Aashna Jena , Vivek Gupta , Manish Shrivastava , Julian Martin Eisenschlos

Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data

In the field of business data analysis, the ability to extract actionable insights from vast and varied datasets is essential for informed decision-making and maintaining a competitive edge. Traditional rule-based systems, while reliable,…

Computation and Language · Computer Science 2024-04-25 Aliaksei Vertsel , Mikhail Rumiantsau

Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we investigate whether structured linked data,…

Information Retrieval · Computer Science 2026-03-12 Andrea Volpini , Elie Raad , Beatrice Gamba , David Riccitelli

Termite: A System for Tunneling Through Heterogeneous Data

Data-driven analysis is important in virtually every modern organization. Yet, most data is underutilized because it remains locked in silos inside of organizations; large organizations have thousands of databases, and billions of files…

Databases · Computer Science 2019-03-13 Raul Castro Fernandez , Samuel Madden

StaTIX - Statistical Type Inference on Linked Data

Large knowledge bases typically contain data adhering to various schemas with incomplete and/or noisy type information. This seriously complicates further integration and post-processing efforts, as type information is crucial in correctly…

Applications · Statistics 2019-02-19 Artem Lutov , Soheil Roshankish , Mourad Khayati , Philippe Cudré-Mauroux

A Seamless Integration of Association Rule Mining with Database Systems

The need for Knowledge and Data Discovery Management Systems (KDDMS) that support ad hoc data mining queries has been long recognized. A significant amount of research has gone into building tightly coupled systems that integrate…

Databases · Computer Science 2007-05-23 Raj P. Gopalan , Tariq Nuruddin , Yudho Giri Sucahyo