Related papers: COMPARE: Accelerating Groupwise Comparison in Rela…
In this paper, we motivated the need for relational database systems to support subset query processing. We defined new operators in relational algebra, and new constructs in SQL for expressing subset queries. We also illustrated the…
A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier work, we put forward a succinct representation system for relational data called factorised databases and reported on…
Aggregation in relational databases is accomplished through hashing and sorting interval data, which is computationally expensive and scales poorly as the data volumes grow. In this paper, we show how quantitative interval and time-series…
Databases employ indexes to filter out irrelevant records, which reduces scan overhead and speeds up query execution. However, this optimization is only available to queries that filter on the indexed attribute. To extend these speedups to…
Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis, or finding unusual user behavioral trends, in these large tables of activity records. In a traditional…
There exists a wide set of techniques to perform keyword-based search over relational databases but all of them match the keywords in the users' queries to elements of the databases to be queried as first step. The matching process is a…
The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of…
The database community lacks a unified relational query language for subset selection and optimisation queries, limiting both user expression and query optimiser reasoning about such problems. Decades of research (latterly under the rubric…
Factorised databases are relational databases that use compact factorised representations at the physical layer to reduce data redundancy and boost query performance. This paper introduces FDB, an in-memory query engine for…
The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which…
We study a class of graph analytics SQL queries, which we call relationship queries. Relationship queries are a wide superset of fixed-length graph reachability queries and of tree pattern queries. Intuitively, it discovers target entities…
With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources…
This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra…
During the last two decades, it has been increasingly acknowledged that the engineering of information systems usually requires a huge effort in integrating master data and business processes. This has led to a plethora of proposals, both…
We allow database user to script a parallel relational database engine with a procedural language. Procedural language code is executed as a user defined relational query operator called transducer. Transducer is tightly integrated with…
Database system is an indispensable part of software projects. It plays an important role in data organization and storage. Its performance and efficiency are directly related to the performance of software. Nowadays, we have many general…
This paper explores a new opportunity to improve the performance of transaction processing at the application side by merging structurely similar statements or transactions. Concretely, we re-write transactions to 1) merge similar…
Comparing research papers is a conventional method to demonstrate progress in experimental research. We present COMPARE, a taxonomy and a dataset of comparison discussions in peer reviews of research papers in the domain of experimental…
Modern analytical workloads increasingly combine relational data with array-valued attributes. While columnar database systems efficiently process such workloads, their ability to optimize queries that interleave relational operators with…
Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a 'flat'…