English

Partial Partial Aggregates

Databases 2026-03-31 v1

Abstract

We introduce partial partial aggregates (PPA), a query optimization technique for distributed engines that pushes only the local compute phase of an aggregate operation through joins. A query that aggregates after a join involves two logical operations, each requiring a network shuffle. Pushing a full aggregate (COMPUTE\rightarrowDISTRIBUTE\rightarrowMERGE) below the join introduces a third shuffle. In the specific case where the join key is included in the grouping key and the join is FK-PK, the full pushed aggregate can eliminate the top-level aggregate entirely, making it the preferred choice. In all other key configurations, the top aggregate must remain, and the extra shuffle is wasteful. A PPA pushes only COMPUTE, achieving data reduction before the join without the extra shuffle. The technique relies on the distributive property of aggregates and requires accurate NDV estimation for cost-based decisions.

Cite

@article{arxiv.2603.26698,
  title  = {Partial Partial Aggregates},
  author = {Claude Brisson},
  journal= {arXiv preprint arXiv:2603.26698},
  year   = {2026}
}

Comments

9 pages, no figure

R2 v1 2026-07-01T11:41:20.154Z