Related papers: Optimal Data Selection: An Online Distributed View

The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization…

Machine Learning · Computer Science 2015-04-23 Rafael da Ponte Barbosa , Alina Ene , Huy L. Nguyen , Justin Ward

Distributed Maximization of Submodular plus Diversity Functions for Multi-label Feature Selection on Huge Datasets

There are many problems in machine learning and data mining which are equivalent to selecting a non-redundant, high "quality" set of objects. Recommender systems, feature selection, and data summarization are among many applications of…

Machine Learning · Computer Science 2019-04-19 Mehrdad Ghadiri , Mark Schmidt

Online Learning of Assignments that Maximize Submodular Functions

Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize value of information? These applications exhibit strong diminishing returns: Selection of…

Machine Learning · Computer Science 2009-08-07 Daniel Golovin , Andreas Krause , Matthew Streeter

Structured Robust Submodular Maximization: Offline and Online Algorithms

Constrained submodular function maximization has been used in subset selection problems such as selection of most informative sensor locations. While these models have been quite popular, the solutions Constrained submodular function…

Data Structures and Algorithms · Computer Science 2020-10-15 Alfredo Torrico , Mohit Singh , Sebastian Pokutta , Nika Haghtalab , Joseph , Naor , Nima Anari

Online data assimilation in distributionally robust optimization

This paper considers a class of real-time decision making problems to minimize the expected value of a function that depends on a random variable $\xi$ under an unknown distribution $\mathbb{P}$. In this process, samples of $\xi$ are…

Optimization and Control · Mathematics 2020-09-08 Dan Li , Sonia Martinez

On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is…

Machine Learning · Computer Science 2025-04-04 Maximilian Böther , Abraham Sebastian , Pranjal Awasthi , Ana Klimovic , Srikumar Ramalingam

Distributed Online Big Data Classification Using Context Information

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data…

Machine Learning · Computer Science 2013-07-03 Cem Tekin , Mihaela van der Schaar

Online and Streaming Algorithms for Constrained $k$-Submodular Maximization

Constrained $k$-submodular maximization is a general framework that captures many discrete optimization problems such as ad allocation, influence maximization, personalized recommendation, and many others. In many of these applications,…

Data Structures and Algorithms · Computer Science 2023-05-26 Fabian Spaeh , Alina Ene , Huy L. Nguyen

Deletion-Robust Submodular Maximization at Scale

Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation. We cast this problem as an instance of a deletion-robust submodular…

Machine Learning · Computer Science 2017-11-22 Ehsan Kazemi , Morteza Zadimoghaddam , Amin Karbasi

Quickest Online Selection of an Increasing Subsequence of Specified Size

Given a sequence of independent random variables with a common continuous distribution, we consider the online decision problem where one seeks to minimize the expected value of the time that is needed to complete the selection of a…

Probability · Mathematics 2016-09-05 Alessandro Arlotto , Elchanan Mossel , J. Michael Steele

Fair and Representative Subset Selection from Data Streams

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…

Data Structures and Algorithms · Computer Science 2021-02-15 Yanhao Wang , Francesco Fabbri , Michael Mathioudakis

Online Submodular Maximization with Preemption

Submodular function maximization has been studied extensively in recent years under various constraints and models. The problem plays a major role in various disciplines. We study a natural online variant of this problem in which elements…

Data Structures and Algorithms · Computer Science 2015-01-26 Niv Buchbinder , Moran Feldman , Roy Schwartz

Online Submodular Maximization under a Matroid Constraint with Application to Learning Assignments

Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize the value of the ranking? These applications exhibit strong diminishing returns: Redundancy…

Machine Learning · Computer Science 2014-07-07 Daniel Golovin , Andreas Krause , Matthew Streeter

An Optimal Algorithm for Online Unconstrained Submodular Maximization

We consider a basic problem at the interface of two fundamental fields: submodular optimization and online learning. In the online unconstrained submodular maximization (online USM) problem, there is a universe $[n]=\{1,2,...,n\}$ and a…

Machine Learning · Computer Science 2018-06-12 Tim Roughgarden , Joshua R. Wang

Ordered Submodularity and its Applications to Diversifying Recommendations

A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very…

Data Structures and Algorithms · Computer Science 2022-03-02 Jon Kleinberg , Emily Ryu , Éva Tardos

Coresets remembered and items forgotten: submodular maximization with deletions

In recent years we have witnessed an increase on the development of methods for submodular optimization, which have been motivated by the wide applicability of submodular functions in real-world data-science problems. In this paper, we…

Data Structures and Algorithms · Computer Science 2022-09-15 Guangyi Zhang , Nikolaj Tatti , Aristides Gionis

Data assimilation and online optimization with performance guarantees

This paper considers a class of real-time stochastic optimization problems dependent on an unknown probability distribution. In the considered scenario, data is streaming frequently while trying to reach a decision. Thus, we aim to devise a…

Optimization and Control · Mathematics 2020-09-08 Dan Li , Sonia Martinez

Distributionally robust optimization through the lens of submodularity

Distributionally robust optimization is used to tackle decision making problems under uncertainty where the distribution of the uncertain data is ambiguous. Many ambiguity sets have been proposed for continuous uncertainty that build on…

Optimization and Control · Mathematics 2025-05-28 Karthik Natarajan , Divya Padmanabhan , Arjun Ramachandra

Decentralized Online Big Data Classification - a Bandit Framework

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data…

Machine Learning · Computer Science 2013-08-27 Cem Tekin , Mihaela van der Schaar

Nearly Optimal Subdata Selection

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further…

Methodology · Statistics 2026-04-28 Min Yang , Wei Zheng , John Stufken , Ming-Chung Chang , Ting Tian , Xueqin Wang