Graham Cormode — Scifaro

Power Transform Revisited: Numerically Stable, and Federated

Power transforms are popular parametric methods for making data more Gaussian-like, and are widely used as preprocessing steps in statistical analysis and machine learning. However, we find that direct implementations of power transforms…

Machine Learning · Computer Science 2026-04-16 Xuefeng Xu , Graham Cormode

FedPS: Federated data Preprocessing via aggregated Statistics

Federated Learning (FL) enables multiple parties to collaboratively train machine learning models without sharing raw data. However, before training, data must be preprocessed to address missing values, inconsistent formats, and…

Machine Learning · Computer Science 2026-02-12 Xuefeng Xu , Graham Cormode

Streaming Algorithms for Bin Packing and Vector Scheduling

Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases,…

Data Structures and Algorithms · Computer Science 2026-01-27 Graham Cormode , Pavel Veselý

A Tight Lower Bound for Comparison-Based Quantile Summaries

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep…

Data Structures and Algorithms · Computer Science 2026-01-27 Graham Cormode , Pavel Veselý

GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks

State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit…

Machine Learning · Computer Science 2025-11-14 Samuel Maddock , Shripad Gade , Graham Cormode , Will Bullock

Federated Computation of ROC and PR Curves

Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are fundamental tools for evaluating machine learning classifiers, offering detailed insights into the trade-offs between true positive rate vs. false positive rate…

Machine Learning · Computer Science 2025-10-07 Xuefeng Xu , Graham Cormode

Private Federated Multiclass Post-hoc Calibration

Calibrating machine learning models so that predicted probabilities better reflect the true outcome frequencies is crucial for reliable decision-making across many applications. In Federated Learning (FL), the goal is to train a global…

Machine Learning · Computer Science 2025-10-03 Samuel Maddock , Graham Cormode , Carsten Maple

Synthetic Tabular Data: Methods, Attacks and Defenses

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade,…

Machine Learning · Computer Science 2025-06-09 Graham Cormode , Samuel Maddock , Enayat Ullah , Shripad Gade

Leveraging Vertical Public-Private Split for Improved Synthetic Data Generation

Differentially Private Synthetic Data Generation (DP-SDG) is a key enabler of private and secure tabular-data sharing, producing artificial data that carries through the underlying statistical properties of the input data. This typically…

Machine Learning · Computer Science 2025-04-16 Samuel Maddock , Shripad Gade , Graham Cormode , Will Bullock

PAPAYA Federated Analytics Stack: Engineering Privacy, Scalability and Practicality

Cross-device Federated Analytics (FA) is a distributed computation paradigm designed to answer analytics queries about and derive insights from data held locally on users' devices. On-device computations combined with other privacy and…

Machine Learning · Computer Science 2025-03-28 Harish Srinivas , Graham Cormode , Mehrdad Honarkhah , Samuel Lurye , Jonathan Hehir , Lunwen He , George Hong , Ahmed Magdy , Dzmitry Huba , Kaikai Wang , Shen Guo , Shoubhik Bhattacharya

Distributed, communication-efficient, and differentially private estimation of KL divergence

A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical…

Machine Learning · Computer Science 2024-12-02 Mary Scott , Sayan Biswas , Graham Cormode , Carsten Maple

Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

Statistical heterogeneity is a measure of how skewed the samples of a dataset are. It is a common problem in the study of differential privacy that the usage of a statistically heterogeneous dataset results in a significant loss of…

Machine Learning · Computer Science 2024-12-02 Mary Scott , Graham Cormode , Carsten Maple

Privacy-preserving Fuzzy Name Matching for Sharing Financial Intelligence

Financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based…

Cryptography and Security · Computer Science 2024-11-11 Harsh Kasyap , Ugur Ilker Atmaca , Carsten Maple , Graham Cormode , Jiancong He

FLAIM: AIM-based Synthetic Data Generation in the Federated Setting

Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While…

Cryptography and Security · Computer Science 2024-09-06 Samuel Maddock , Graham Cormode , Carsten Maple

Streaming Zero-Knowledge Proofs

Streaming interactive proofs (SIPs) enable a space-bounded algorithm with one-pass access to a massive stream of data to verify a computation that requires large space, by communicating with a powerful but untrusted prover. This work…

Computational Complexity · Computer Science 2024-05-28 Graham Cormode , Marcel Dall'Agnol , Tom Gur , Chris Hickey

Verifiable Differential Privacy

Differential Privacy (DP) is often presented as a strong privacy-enhancing technology with broad applicability and advocated as a de-facto standard for releasing aggregate statistics on sensitive data. However, in many embodiments, DP…

Cryptography and Security · Computer Science 2024-02-13 Ari Biswas , Graham Cormode

Federated Experiment Design under Distributed Differential Privacy

Experiment design has a rich history dating back over a century and has found many critical applications across various fields since then. The use and collection of users' data in experiments often involve sensitive personal information, so…

Cryptography and Security · Computer Science 2023-11-09 Wei-Ning Chen , Graham Cormode , Akash Bharadwaj , Peter Romov , Ayfer Özgür

Relative Error Streaming Quantiles

Estimating ranks, quantiles, and distributions over streaming data is a central task in data analysis and monitoring. Given a stream of $n$ items from a data universe equipped with a total order, the task is to compute a sketch (data…

Data Structures and Algorithms · Computer Science 2023-08-25 Graham Cormode , Zohar Karnin , Edo Liberty , Justin Thaler , Pavel Veselý

PrivLava: Synthesizing Relational Data with Foreign Keys under Differential Privacy

Answering database queries while preserving privacy is an important problem that has attracted considerable research attention in recent years. A canonical approach to this problem is to use synthetic data. That is, we replace the input…

Databases · Computer Science 2023-04-11 Kuntai Cai , Xiaokui Xiao , Graham Cormode

Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting

Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building…

Data Structures and Algorithms · Computer Science 2023-02-07 Jonathan Hehir , Daniel Ting , Graham Cormode