Algorithms for Massive Data -- Lecture Notes
Abstract
These are the lecture notes for the course CM0622 - Algorithms for Massive Data, Ca' Foscari University of Venice. The goal of this course is to introduce algorithmic techniques for dealing with massive data: data so large that it does not fit in the computer's memory. There are two main solutions to deal with massive data: (lossless) compressed data structures and (lossy) data sketches. These notes cover both topics: compressed suffix arrays, probabilistic filters, sketching under various metrics, Locality Sensitive Hashing, nearest neighbour search, algorithms on streams.
Keywords
Cite
@article{arxiv.2301.00754,
title = {Algorithms for Massive Data -- Lecture Notes},
author = {Nicola Prezza},
journal= {arXiv preprint arXiv:2301.00754},
year = {2026}
}
Comments
added chapter 1 on compressed data structures. Fixed a few mistakes (Blooom filter analysis) and typos. Restructured chapters. Added frequency-estimation and second-order moment. Bugfixes; added SQL sketches; reorganized material on frequency estimation