English

Algorithms for Massive Data -- Lecture Notes

Data Structures and Algorithms 2026-02-27 v15

Abstract

These are the lecture notes for the course CM0622 - Algorithms for Massive Data, Ca' Foscari University of Venice. The goal of this course is to introduce algorithmic techniques for dealing with massive data: data so large that it does not fit in the computer's memory. There are two main solutions to deal with massive data: (lossless) compressed data structures and (lossy) data sketches. These notes cover both topics: compressed suffix arrays, probabilistic filters, sketching under various metrics, Locality Sensitive Hashing, nearest neighbour search, algorithms on streams.

Keywords

Cite

@article{arxiv.2301.00754,
  title  = {Algorithms for Massive Data -- Lecture Notes},
  author = {Nicola Prezza},
  journal= {arXiv preprint arXiv:2301.00754},
  year   = {2026}
}

Comments

added chapter 1 on compressed data structures. Fixed a few mistakes (Blooom filter analysis) and typos. Restructured chapters. Added frequency-estimation and second-order moment. Bugfixes; added SQL sketches; reorganized material on frequency estimation