Related papers: Optimizing Relational Queries over Array-Valued Da…
Data processing systems roughly group into families such as relational, array, graph, and key-value. Many data processing tasks exceed the capabilities of any one family, require data stored across families, or run faster when partitioned…
Hybrid complex analytics workloads typically include (i) data management tasks (joins, selections, etc. ), easily expressed using relational algebra (RA)-based languages, and (ii) complex analytics tasks (regressions, matrix decompositions,…
Analytical queries often require a mixture of relational and linear algebra operations applied to the same data. This poses a challenge to analytic systems that must bridge the gap between relations and matrices. Previous work has mainly…
Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis, or finding unusual user behavioral trends, in these large tables of activity records. In a traditional…
In modern data analytics, analysts frequently face the challenge of searching for desirable entities by evaluating, for each entity, a collection of its feature relations to derive key analytical properties. This search is challenging…
Analytics tasks manipulate structured data with variants of relational algebra (RA) and quantitative data with variants of linear algebra (LA). The two computational models have overlapping expressiveness, motivating a common programming…
Recent advances with in-memory columnar database techniques have increased the performance of analytical queries on very large databases and data warehouses. At the same time, advances in artificial intelligence (AI) algorithms have…
Text analytical tasks like word embedding, phrase mining, and topic modeling, are placing increasing demands as well as challenges to existing database management systems. In this paper, we provide a novel algebraic approach based on…
The database community lacks a unified relational query language for subset selection and optimisation queries, limiting both user expression and query optimiser reasoning about such problems. Decades of research (latterly under the rubric…
Relational data stored in RDBMS is foundational to many real-world applications across domains such as e-commerce, finance, and sociality. While deep neural networks (DNNs) have achieved strong performance on tabular data with a single…
Data analysis often involves comparing subsets of data across many dimensions for finding unusual trends and patterns. While the comparison between subsets of data can be expressed using SQL, they tend to be complex to write, and suffer…
Aggregation in relational databases is accomplished through hashing and sorting interval data, which is computationally expensive and scales poorly as the data volumes grow. In this paper, we show how quantitative interval and time-series…
Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming…
As the complexity of modern workloads and hardware increasingly outpaces human research and engineering capacity, existing methods for database performance optimization struggle to keep pace. To address this gap, a new class of techniques,…
The need for Knowledge and Data Discovery Management Systems (KDDMS) that support ad hoc data mining queries has been long recognized. A significant amount of research has gone into building tightly coupled systems that integrate…
Recent advances in query optimization have shifted from traditional rule-based and cost-based techniques towards machine learning-driven approaches. Among these, reinforcement learning (RL) has attracted significant attention due to its…
Linear algebraic expressions are the essence of many computationally intensive problems, including scientific simulations and machine learning applications. However, translating high-level formulations of these expressions to efficient…
The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators…
The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of…
Machine learning algorithms are commonly specified in linear algebra (LA). LA expressions can be rewritten into more efficient forms, by taking advantage of input properties such as sparsity, as well as program properties such as common…