Related papers: Finding a Second Wind: Speeding Up Graph Traversal…
In this paper, we discuss a novel technique for processing correlated subqueries in SQL. The core idea is to isolate the non-correlated part of the predicate and use it to reduce the number of evaluations of the correlated part. We begin by…
In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query…
We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however…
Users of MapReduce often run into performance problems when they scale up their workloads. Many of the problems they encounter can be overcome by applying techniques learned from over three decades of research on parallel DBMSs. However,…
Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming…
Column-oriented database systems have been a real game changer for the industry in recent years. Highly tuned and performant systems have evolved that provide users with the possibility of answering ad hoc queries over large datasets in an…
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the…
Data compression is widely used in contemporary column-oriented DBMSes to lower space usage and to speed up query processing. Pioneering systems have introduced compression to tackle the disk bandwidth bottleneck by trading CPU processing…
Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis, or finding unusual user behavioral trends, in these large tables of activity records. In a traditional…
Efficient multi-core parallel processing of recursive join queries is critical for achieving good performance in graph database management systems (GDBMSs). Prior work adopts two broad approaches. First is the state of the art morsel-driven…
In data lakes, information on the same subject is often fragmented across multiple tables. Table union search aims to find the top-k tables that can be unioned with a query table to extend it with more rows, without relying on metadata or…
The evaluation of Datalog rules over large Knowledge Graphs (KGs) is essential for many applications. In this paper, we present a new method of materializing Datalog inferences, which combines a column-based memory layout with novel…
Memristive in-memory sorting has been proposed recently to improve hardware sorting efficiency. Using iterative in-memory min computations, data movements between memory and external processing units can be eliminated for improved latency…
Differential computation (DC) is a highly general incremental computation/view maintenance technique that can maintain the output of an arbitrary and possibly recursive dataflow computation upon changes to its base inputs. As such, it is a…
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization techniques are few of the areas under extreme research these days. Big names like Amazon, Google, Microsoft and many more are working on…
In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual…
Hash based search has, proven excellence on large data warehouses stored in column store. Data distribution has significant impact on hash based search. To reduce impact of data distribution, we have proposed Memory Managed Hash (MMH)…
Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…
The performance of main memory column stores highly depends on the scan and lookup operations on the base column layouts. Existing column-stores adopt a homogeneous column layout, leading to sub-optimal performance on real workloads since…
Recursive query processing has experienced a recent resurgence, as a result of its use in many modern application domains, including data integration, graph analytics, security, program analysis, networking and decision making. Due to the…