Related papers: Cohort Query Processing
Cohort analysis is a pervasive activity in web analytics. One divides users into groups according to specific criteria and tracks their behavior over time. Despite its extensive use, academic circles do not discuss cohort analysis to…
Modern analytical workloads increasingly combine relational data with array-valued attributes. While columnar database systems efficiently process such workloads, their ability to optimize queries that interleave relational operators with…
This work presents an abstract model for the computations performed by analytic column stores or columnar query processors. The model is based on circuits whose wires carry columns rather than scalar values, and whose nodes apply operators…
In history research, cohort analysis seeks to identify social structures and figure mobilities by studying the group-based behavior of historical figures. Prior works mainly employ automatic data mining approaches, lacking effective visual…
Cohort studies are of significant importance in the field of healthcare analysis. However, existing methods typically involve manual, labor-intensive, and expert-driven pattern definitions or rely on simplistic clustering techniques that…
Data analysis often involves comparing subsets of data across many dimensions for finding unusual trends and patterns. While the comparison between subsets of data can be expressed using SQL, they tend to be complex to write, and suffer…
A crucial step in cohort studies is to extract the required cohort from one or more study datasets. This step is time-consuming, especially when a researcher is presented with a dataset that they have not previously worked with. When the…
We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however…
Clinical cohort definition is crucial for patient recruitment and observational studies, yet translating inclusion/exclusion criteria into SQL queries remains challenging and manual. We present an automated system utilizing large language…
Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to…
Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process…
Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have…
Databases employ indexes to filter out irrelevant records, which reduces scan overhead and speeds up query execution. However, this optimization is only available to queries that filter on the indexed attribute. To extend these speedups to…
The need for accurate SQL progress estimation in the context of decision support administration has led to a number of techniques proposed for this task. Unfortunately, no single one of these progress estimators behaves robustly across the…
Recursive queries and recursive derived tables constitute an important part of the SQL standard. Their efficient processing is important for many real-life applications that rely on graph or hierarchy traversal. Position-enabled…
Over the past 40 years, database management systems (DBMSs) have evolved to provide a sophisticated variety of data management capabilities. At the same time, tools for managing queries over the data have remained relatively primitive. One…
Crowd-sourcing is a powerful solution for finding correct answers to expensive and unanswered queries in databases, including those with uncertain and incomplete data. Attempts to use crowd-sourcing to exploit human abilities to process…
In this paper, we motivated the need for relational database systems to support subset query processing. We defined new operators in relational algebra, and new constructs in SQL for expressing subset queries. We also illustrated the…
CoWrangler is a data-wrangling recommender system designed to streamline data processing tasks. Recognizing that data processing is often time-consuming and complex for novice users, we aim to simplify the decision-making process regarding…
Column-oriented database systems have been a real game changer for the industry in recent years. Highly tuned and performant systems have evolved that provide users with the possibility of answering ad hoc queries over large datasets in an…