Related papers: Schema Extraction on Semi-structured Data

Towards the Automated Extraction and Refactoring of NoSQL Schemas from Application Code

In this paper, we present a static code analysis strategy to extract logical schemas from NoSQL applications. Our solution is based on a model-driven reverse engineering process composed of a chain of platform-independent model…

Databases · Computer Science 2026-01-21 Carlos J. Fernandez-Candel , Anthony Cleve , Jesus J. Garcia-Molina

Incremental extraction of a NoSQL database model using an MDA-based process

In recent years, the need to use NoSQL systems to store and exploit big data has been steadily increasing. Most of these systems are characterized by the property "schema less" which means absence of the data model when creating a database.…

Information Retrieval · Computer Science 2019-11-13 Amal Ait Brahim , Rabah Tighilt Ferhat , Gilles Zurfluh

SkiQL: A Unified Schema Query Language

Most NoSQL systems are schema-on-read: data can be stored without first having to declare a Schema that imposes a structure. This schemaless feature offers flexibility to evolve data-intensive applications when data frequently change.…

Databases · Computer Science 2025-10-24 Carlos Javier Fernández Candel , Jesús Joaquín García Molina , Diego Sevilla Ruiz

An Empirical Study on the Design and Evolution of NoSQL Database Schemas

We study how software engineers design and evolve their domain model when building applications against NoSQL data stores. Specifically, we target Java projects that use object-NoSQL mappers to interface with schema-free NoSQL data stores.…

Databases · Computer Science 2020-03-03 Stefanie Scherzinger , Sebastian Sidortschuck

Semi-structured data extraction and modelling: the WIA Project

Over the last decades, the amount of data of all kinds available electronically has increased dramatically. Data are accessible through a range of interfaces including Web browsers, database query languages, application-specific interfaces,…

Software Engineering · Computer Science 2013-10-01 Gianluca Colombo , Ettore Colombo , Andrea Bonomi , Alessandro Mosca , Simone Bassis

Extractive Schema Linking for Text-to-SQL

Text-to-SQL is emerging as a practical interface for real world databases. The dominant paradigm for Text-to-SQL is cross-database or schema-independent, supporting application schemas unseen during training. The schema of a database…

Databases · Computer Science 2025-01-30 Michael Glass , Mustafa Eyceoz , Dharmashankar Subramanian , Gaetano Rossiello , Long Vu , Alfio Gliozzo

NoSQL Schema Design for Time-Dependent Workloads

In this paper, we propose a schema optimization method for time-dependent workloads for NoSQL databases. In our proposed method, we migrate schema according to changing workloads, and the estimated cost of execution and migration are…

Databases · Computer Science 2023-03-30 Yusuke Wakuta , Michael Mior , Teruyoshi Zenmyo , Yuya Sasaki , Makoto Onizuka

Performance Evaluation of Structured and Semi-Structured Bioinformatics Tools: A Comparative Study

There is a wide range of available biological databases developed by bioinformatics experts, employing different methods to extract biological data. In this paper, we investigate and evaluate the performance of some of these methods in…

Databases · Computer Science 2022-02-08 Raja A. Moftah , Abdelsalam M. Maatuk , Richard White

Designing a Visual Tool for Property Graph Schema Extraction and Refinement: An Expert Study

The design space of visual tools that aim to help people create schemas for property graphs is explored. Interviews are conducted with experts in the domain of property graphs and data management in general. Through this collaboration, we…

Databases · Computer Science 2022-01-12 Nimo Beeren

Managing Schema Evolution in NoSQL Data Stores

NoSQL data stores are commonly schema-less, providing no means for globally defining or managing the schema. While this offers great flexibility in early stages of application development, developers soon can experience the heavy burden of…

Databases · Computer Science 2013-08-05 Stefanie Scherzinger , Meike Klettke , Uta Störl

A Taxonomy of Schema Changes for NoSQL Databases

Schema evolution is a crucial aspect in database management. The proposed taxonomies of schema changes have neglected the set of operations that involves relationships between entity types: aggregation and references, as well as the…

Databases · Computer Science 2022-05-25 Alberto Hernández Chillón , Meike Klettke , Diego Sevilla Ruiz , Jesús García Molina

Schema-Driven Information Extraction from Heterogeneous Tables

In this paper, we explore the question of whether large language models can support cost-efficient information extraction from tables. We introduce schema-driven information extraction, a new task that transforms tabular data into…

Computation and Language · Computer Science 2024-11-22 Fan Bai , Junmo Kang , Gabriel Stanovsky , Dayne Freitag , Mark Dredze , Alan Ritter

Relational Databases Ingestion into a NoSQL Data Warehouse

The digital transformation of companies has led to the evolution of databases towards Big Data. Our work is part of this context and concerns more particularly the mechanisms to extract datasets stored in a Data Lake and to store the data…

Databases · Computer Science 2022-03-15 Fatma Abdelhedi , Rym Jemmali , Gilles Zurfluh

A Unified Metamodel for NoSQL and Relational Databases

The Database field is undergoing significant changes. Although relational systems are still predominant, the interest in NoSQL systems is continuously increasing. In this scenario, polyglot persistence is envisioned as the database…

Databases · Computer Science 2025-10-24 Carlos J. Fernández Candel , Diego Sevilla Ruiz , Jesús J. García-Molina

Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL

Schema linking -- the process of aligning natural language questions with database schema elements -- is a critical yet underexplored component of Text-to-SQL systems. While recent methods have focused primarily on improving SQL generation,…

Computation and Language · Computer Science 2026-01-28 Md Mahadi Hasan Nahid , Davood Rafiei , Weiwei Zhang , Yong Zhang

JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery

Schema discovery is an important aspect to working with data in formats such as JSON. Unlike relational databases, JSON data sets often do not have associated structural information. Consumers of such datasets are often left to browse…

Databases · Computer Science 2023-07-07 Michael J. Mior

Schema-Based Query Optimisation for Graph Databases

Recursive graph queries are increasingly popular for extracting information from interconnected data found in various domains such as social networks, life sciences, and business analytics. Graph data often come with schema information that…

Databases · Computer Science 2025-02-13 Chandan Sharma , Pierre Genevès , Nils Gesbert , Nabil Layaïda

Bivariate, Cluster and Suitability Analysis of NoSQL Solutions for Different Application Areas

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing…

Databases · Computer Science 2019-11-27 Samiya Khan , Xiufeng Liu , Syed Arshad Ali , Mansaf Alam

StruClus: Structural Clustering of Large-Scale Graph Databases

We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…

Databases · Computer Science 2016-10-03 Till Schäfer , Petra Mutzel

Challenges for Efficient Query Evaluation on Structured Probabilistic Data

Query answering over probabilistic data is an important task but is generally intractable. However, a new approach for this problem has recently been proposed, based on structural decompositions of input databases, following, e.g., tree…

Databases · Computer Science 2019-08-28 Antoine Amarilli , Silviu Maniu , Mikaël Monet