Related papers: Partial Partial Aggregates

Few-Round Distributed Principal Component Analysis: Closing the Statistical Efficiency Gap by Consensus

Distributed algorithms and theories are called for in this era of big data. Under weaker local signal-to-noise ratios, we improve upon the celebrated one-round distributed principal component analysis (PCA) algorithm designed in the spirit…

Methodology · Statistics 2025-07-01 ZeYu Li , Xinsheng Zhang , Wang Zhou

Efficient sorting, duplicate removal, grouping, and aggregation

Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external…

Databases · Computer Science 2022-09-27 Thanh Do , Goetz Graefe , Jeffrey Naughton

Global Hash Tables Strike Back! An Analysis of Parallel GROUP BY Aggregation

Efficiently computing group aggregations (i.e., GROUP BY) on modern architectures is critical for analytic database systems. Hash-based approaches in today's engines predominantly use a partitioned approach, in which incoming data is…

Databases · Computer Science 2025-09-08 Daniel Xue , Ryan Marcus

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing…

Databases · Computer Science 2021-03-31 Xi Liang , Stavros Sintos , Zechao Shang , Sanjay Krishnan

Memory-Efficient Group-by Aggregates over Multi-Way Joins

Aggregate computation in relational databases has long been done using the standard unary aggregation and binary join operators. These implement the classical model of computing joins between relations two at a time, materializing the…

Databases · Computer Science 2019-06-18 Konstantinos Xirogiannopoulos , Amol Deshpande

Stream Aggregation Through Order Sampling

This is paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and e cient method for weighted sampling from a stream of uniquely…

Data Structures and Algorithms · Computer Science 2017-11-02 Nick Duffield , Yunhong Xu , Liangzhen Xia , Nesreen Ahmed , Minlan Yu

Distributed Multi-task APA over Adaptive Networks Based on Partial Diffusion

Distributed multi-task adaptive strategies are useful to estimate multiple parameter vectors simultaneously in a collaborative manner. The existed distributed multi-task strategies use diffusion mode of cooperation in which during…

Systems and Control · Computer Science 2015-10-01 Vinay Chakravarthi Gogineni , Mrityunjoy Chakraborty

Internal Partial Combinatory Algebras and their Slices

A partial combinatory algebra (PCA) is a set equipped with a partial binary operation that models a notion of computability. This paper studies a generalization of PCAs, introduced by W. Stekelenburg, where a PCA is not a set but an object…

Category Theory · Mathematics 2019-10-23 Jetze Zoethout

Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an optional local pre-aggregation step and then repartitions…

Databases · Computer Science 2018-11-30 Feilong Liu , Ario Salmasi , Spyros Blanas , Anastasios Sidiropoulos

Aggregating Funnels for Faster Fetch&Add and Queues

Many concurrent algorithms require processes to perform fetch-and-add operations on a single memory location, which can be a hot spot of contention. We present a novel algorithm called Aggregating Funnels that reduces this contention by…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Younghun Roh , Yuanhao Wei , Eric Ruppert , Panagiota Fatourou , Siddhartha Jayanti , Julian Shun

Improved Distributed Principal Component Analysis

We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA),…

Machine Learning · Computer Science 2014-12-24 Maria-Florina Balcan , Vandana Kanchanapally , Yingyu Liang , David Woodruff

Computation-Aware Data Aggregation

Data aggregation is a fundamental primitive in distributed computing wherein a network computes a function of every nodes' input. However, while compute time is non-negligible in modern systems, standard models of distributed computing do…

Data Structures and Algorithms · Computer Science 2019-11-14 Bernhard Haeupler , D Ellis Hershkowitz , Anson Kahng , Ariel D. Procaccia

In-Order Sliding-Window Aggregation in Worst-Case Constant Time

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be expressed as binary operators that are associative but not necessarily…

Databases · Computer Science 2020-09-30 Kanat Tangwongsan , Martin Hirzel , Scott Schneider

K-Join: Combining Vertex Covers for Parallel Joins

Significant research effort has been devoted to improving the performance of join processing in the massively parallel computation model, where the goal is to evaluate a query with the minimum possible data transfer between machines.…

Databases · Computer Science 2026-03-12 Simon Frisk , Austen Fan , Paraschos Koutris

Accelerating Big-Data Sorting Through Programmable Switches

Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most…

Databases · Computer Science 2021-03-29 Yamit Barshatz-Schneor , Roy Friedman

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Principal Subspace Analysis (PSA) -- and its sibling, Principal Component Analysis (PCA) -- is one of the most popular approaches for dimensionality reduction in signal processing and machine learning. But centralized PSA/PCA solutions are…

Machine Learning · Computer Science 2021-11-25 Arpita Gang , Bingqing Xiang , Waheed U. Bajwa

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Dedicated accelerators are being designed to address the huge resource requirement of the deep neural network (DNN) applications. The power, performance and area (PPA) constraints limit the number of MACs available in these accelerators.…

Hardware Architecture · Computer Science 2021-02-25 Mahesh Chandra

DPG: A Cache-Efficient Accelerator for Sorting and for Join Operators

We present a new algorithm for fast record retrieval, distribute-probe-gather, or DPG. DPG has important applications both in sorting and in joins. Current main memory sorting algorithms split their work into three phases: extraction of…

Databases · Computer Science 2007-05-23 Gene Cooperman , Xiaoqin Ma , Viet Ha Nguyen

Exploring Key Point Analysis with Pairwise Generation and Graph Partitioning

Key Point Analysis (KPA), the summarization of multiple arguments into a concise collection of key points, continues to be a significant and unresolved issue within the field of argument mining. Existing models adapt a two-stage pipeline of…

Computation and Language · Computer Science 2024-04-18 Xiao Li , Yong Jiang , Shen Huang , Pengjun Xie , Gong Cheng , Fei Huang

Push vs. Pull-Based Loop Fusion in Query Engines

Database query engines use pull-based or push-based approaches to avoid the materialization of data across query operators. In this paper, we study these two types of query engines in depth and present the limitations and advantages of each…

Databases · Computer Science 2016-10-31 Amir Shaikhha , Mohammad Dashti , Christoph Koch