Related papers: TQL: Towards Type-Driven Data Discovery
Efficient and effective data discovery is critical for many modern applications in machine learning and data science. One major bottleneck to the development of a general-purpose data discovery tool is the absence of an expressive formal…
This paper presents \tdl, a typed feature-based representation language and inference system. Type definitions in \tdl\ consist of type and feature constraints over the boolean connectives. \tdl\ supports open- and closed-world reasoning…
Our aim is to investigate ontology-based data access over temporal data with validity time and ontologies capable of temporal conceptual modelling. To this end, we design a temporal description logic, TQL, that extends the standard ontology…
We propose the vision of a functional data model (FDM) and an associated functional query language (FQL). Our proposal has far-reaching consequences: we show a path to come up with a modern QL that solves (almost if not) all problems of SQL…
Organizations can benefit from the use of practices, techniques, and tools from the area of business process management. Through the focus on processes, they create process models that require management, including support for versioning,…
Large Language Models (LLMs) have spurred progress in text-to-SQL, the task of generating SQL queries from natural language questions based on a given database schema. Despite the declarative nature of SQL, it continues to be a complex…
Database systems have to cater to the growing demands of the information age. The growth of the new age information retrieval powerhouses like search engines has thrown a challenge to the data management community to come up with novel…
The Web of Linked Data is composed of tons of RDF documents interlinked to each other forming a huge repository of distributed semantic data. Effectively querying this distributed data source is an important open problem in the Semantic Web…
With the growing abundance of repositories containing tabular data, discovering relevant tables for in-depth analysis remains a challenging task. Existing table discovery methods primarily retrieve desired tables based on a query table or…
Table learning, which lies at the intersection of machine learning and modern database systems, has recently attracted growing attention. However, existing table learning frameworks typically require explicit data export and extensive…
The amount of large-scale scientific computing software is dramatically increasing. In this work, we designed a new language, named feature query language (FQL), to collect and extract software features from a quick static code analysis. We…
The popularity of data science as a discipline and its importance in the emerging economy and industrial progress dictate that machine learning be democratized for the masses. This also means that the current practice of workforce training…
We describe a meta-querying system for databases containing queries in addition to ordinary data. In the context of such databases, a meta-query is a query about queries. Representing stored queries in XML, and using the standard XML…
Analytics on structured data is a mature field with many successful methods. However, most real world data exists in unstructured form, such as images and conversations. We investigate the potential of Large Language Models (LLMs) to enable…
Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis…
Big data management aims to establish data hubs that support data in multiple models and types in an all-around way. Thus, the multi-model database system is a promising architecture for building such a multi-model data store. For an…
Data discovery in data lakes with ever increasing datasets has long been recognized as a big challenge in the realm of data management, especially for semantic search of and hierarchical global catalog generation of tables. While large…
Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, rely on fixed-point computations. The introduction of recursive common table expressions (CTEs) using the WITH RECURSIVE…
Current approaches to data discovery match keywords between metadata and queries. This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing…
Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches…