Related papers: Text Data Integration

Data Shapes and Data Transformations

Nowadays, information management systems deal with data originating from different sources including relational databases, NoSQL data stores, and Web data formats, varying not only in terms of data formats, but also in the underlying data…

Databases · Computer Science 2012-11-08 Michael Hausenblas , Boris Villazon-Terrazas , Richard Cyganiak

Graph integration of structured, semistructured and unstructured data for data journalism

Nowadays, journalism is facilitated by the existence of large amounts of digital data sources, including many Open Data ones. Such data sources are extremely heterogeneous, ranging from highly struc-tured (relational databases),…

Databases · Computer Science 2020-11-02 Oana Balalau , Catarina Conceiç{ã}o , Helena Galhardas , Ioana Manolescu , Tayeb Merabti , Jingmao You , Youssr Youssef

A Case for Computing on Unstructured Data

Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a…

Databases · Computer Science 2025-09-19 Mushtari Sadia , Amrita Roy Chowdhury , Ang Chen

Graph integration of structured, semistructured and unstructured data for data journalism

Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and…

Databases · Computer Science 2020-12-17 Angelos-Christos Anadiotis , Oana Balalau , Catarina Conceicao , Helena Galhardas , Mhd Yamen Haddad , Ioana Manolescu , Tayeb Merabti , Jingmao You

Information Integration using the Typed Graph Model

Schema and data integration have been a challenge for more than 40 years. While data warehouse technologies are quite a success story, there is still a lack of information integration methods, especially if the data sources are based on…

Databases · Computer Science 2021-07-21 Fritz Laux , Malcolm Crowe

A Primer on the Data Cleaning Pipeline

The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this…

Databases · Computer Science 2023-07-26 Rebecca C. Steorts

A Survey of Heterogeneous Information Network Analysis

Most real systems consist of a large number of interacting, multi-typed components, while most contemporary researches model them as homogeneous networks, without distinguishing different types of objects and links in the networks.…

Social and Information Networks · Computer Science 2015-11-17 Chuan Shi , Yitong Li , Jiawei Zhang , Yizhou Sun , Philip S. Yu

Dataspace architecture and manage its components class projection

Big Data technology is described. Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. There is constructed dataspace architecture. Dataspace has focused solely - and…

Databases · Computer Science 2019-05-07 Nataliya Shakhovska , Yurii Bolubash

A Model and Survey of Distributed Data-Intensive Systems

Data is a precious resource in today's society, and is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Alessandro Margara , Gianpaolo Cugola , Nicolò Felicioni , Stefano Cilloni

Data Transformation Strategies to Remove Heterogeneity

Data heterogeneity is a prevalent issue, stemming from various conflicting factors, making its utilization complex. This uncertainty, particularly resulting from disparities in data formats, frequently necessitates the involvement of…

Machine Learning · Computer Science 2025-07-18 Sangbong Yoo , Jaeyoung Lee , Chanyoung Yoon , Geonyeong Son , Hyein Hong , Seongbum Seo , Soobin Yim , Chanyoung Jung , Jungsoo Park , Misuk Kim , Yun Jang

Learning from data with structured missingness

Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious,…

Machine Learning · Statistics 2023-04-05 Robin Mitra , Sarah F. McGough , Tapabrata Chakraborti , Chris Holmes , Ryan Copping , Niels Hagenbuch , Stefanie Biedermann , Jack Noonan , Brieuc Lehmann , Aditi Shenvi , Xuan Vinh Doan , David Leslie , Ginestra Bianconi , Ruben Sanchez-Garcia , Alisha Davies , Maxine Mackintosh , Eleni-Rosalina Andrinopoulou , Anahid Basiri , Chris Harbron , Ben D. MacArthur

Stratified Data Integration

We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected…

Databases · Computer Science 2021-05-21 Fausto Giunchiglia , Alessio Zamboni , Mayukh Bagchi , Simone Bocca

Termite: A System for Tunneling Through Heterogeneous Data

Data-driven analysis is important in virtually every modern organization. Yet, most data is underutilized because it remains locked in silos inside of organizations; large organizations have thousands of databases, and billions of files…

Databases · Computer Science 2019-03-13 Raul Castro Fernandez , Samuel Madden

Warehousing Web Data

In a data warehousing process, mastering the data preparation phase allows substantial gains in terms of time and performance when performing multidimensional analysis or using data mining algorithms. Furthermore, a data warehouse can…

Databases · Computer Science 2007-05-23 Jérôme Darmont , Omar Boussaïd , Fadila Bentayeb

Process Mining for Unstructured Data: Challenges and Research Directions

The application of process mining for unstructured data might significantly elevate novel insights into disciplines where unstructured data is a common data format. To efficiently analyze unstructured data by process mining and to convey…

Databases · Computer Science 2024-10-01 Agnes Koschmider , Milda Aleknonytė-Resch , Frederik Fonger , Christian Imenkamp , Arvid Lepsien , Kaan Apaydin , Maximilian Harms , Dominik Janssen , Dominic Langhammer , Tobias Ziolkowski , Yorck Zisgen

Fusing restricted information

Information fusion deals with the integration and merging of data and information from multiple (heterogeneous) sources. In many cases, the information that needs to be fused has security classification. The result of the fusion process is…

Cryptography and Security · Computer Science 2017-06-20 Magnus Jändel , Pontus Svenson , Ronnie Johansson

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome,…

Quantitative Methods · Quantitative Biology 2018-10-22 Marinka Zitnik , Francis Nguyen , Bo Wang , Jure Leskovec , Anna Goldenberg , Michael M. Hoffman

Unstructured and structured data: Can we have the best of both worlds with large language models?

This paper presents an opinion on the potential of using large language models to query on both unstructured and structured data. It also outlines some research challenges related to the topic of building question-answering systems for both…

Databases · Computer Science 2023-07-07 Wang-Chiew Tan

Linked Data Integration with Conflicts

Linked Data have emerged as a successful publication format and one of its main strengths is its fitness for integration of data from multiple sources. This gives them a great potential both for semantic applications and the enterprise…

Databases · Computer Science 2014-10-30 Jan Michelfeit , Tomáš Knap , Martin Nečaský

Towards an Integrated Platform for Big Data Analysis

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage,…

Databases · Computer Science 2020-04-29 Mahdi Bohlouli , Frank Schulz , Lefteris Angelis , David Pahor , Ivona Brandic , David Atlan , Rosemary Tate