English
Related papers

Related papers: Optimal Data Selection: An Online Distributed View

200 papers

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization…

Machine Learning · Computer Science 2015-04-23 Rafael da Ponte Barbosa , Alina Ene , Huy L. Nguyen , Justin Ward

There are many problems in machine learning and data mining which are equivalent to selecting a non-redundant, high "quality" set of objects. Recommender systems, feature selection, and data summarization are among many applications of…

Machine Learning · Computer Science 2019-04-19 Mehrdad Ghadiri , Mark Schmidt

Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize value of information? These applications exhibit strong diminishing returns: Selection of…

Machine Learning · Computer Science 2009-08-07 Daniel Golovin , Andreas Krause , Matthew Streeter

Constrained submodular function maximization has been used in subset selection problems such as selection of most informative sensor locations. While these models have been quite popular, the solutions Constrained submodular function…

Data Structures and Algorithms · Computer Science 2020-10-15 Alfredo Torrico , Mohit Singh , Sebastian Pokutta , Nika Haghtalab , Joseph , Naor , Nima Anari

This paper considers a class of real-time decision making problems to minimize the expected value of a function that depends on a random variable $\xi$ under an unknown distribution $\mathbb{P}$. In this process, samples of $\xi$ are…

Optimization and Control · Mathematics 2020-09-08 Dan Li , Sonia Martinez

Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is…

Machine Learning · Computer Science 2025-04-04 Maximilian Böther , Abraham Sebastian , Pranjal Awasthi , Ana Klimovic , Srikumar Ramalingam

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data…

Machine Learning · Computer Science 2013-07-03 Cem Tekin , Mihaela van der Schaar

Constrained $k$-submodular maximization is a general framework that captures many discrete optimization problems such as ad allocation, influence maximization, personalized recommendation, and many others. In many of these applications,…

Data Structures and Algorithms · Computer Science 2023-05-26 Fabian Spaeh , Alina Ene , Huy L. Nguyen

Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation. We cast this problem as an instance of a deletion-robust submodular…

Machine Learning · Computer Science 2017-11-22 Ehsan Kazemi , Morteza Zadimoghaddam , Amin Karbasi

Given a sequence of independent random variables with a common continuous distribution, we consider the online decision problem where one seeks to minimize the expected value of the time that is needed to complete the selection of a…

Probability · Mathematics 2016-09-05 Alessandro Arlotto , Elchanan Mossel , J. Michael Steele

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…

Data Structures and Algorithms · Computer Science 2021-02-15 Yanhao Wang , Francesco Fabbri , Michael Mathioudakis

Submodular function maximization has been studied extensively in recent years under various constraints and models. The problem plays a major role in various disciplines. We study a natural online variant of this problem in which elements…

Data Structures and Algorithms · Computer Science 2015-01-26 Niv Buchbinder , Moran Feldman , Roy Schwartz

Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize the value of the ranking? These applications exhibit strong diminishing returns: Redundancy…

Machine Learning · Computer Science 2014-07-07 Daniel Golovin , Andreas Krause , Matthew Streeter

We consider a basic problem at the interface of two fundamental fields: submodular optimization and online learning. In the online unconstrained submodular maximization (online USM) problem, there is a universe $[n]=\{1,2,...,n\}$ and a…

Machine Learning · Computer Science 2018-06-12 Tim Roughgarden , Joshua R. Wang

A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very…

Data Structures and Algorithms · Computer Science 2022-03-02 Jon Kleinberg , Emily Ryu , Éva Tardos

In recent years we have witnessed an increase on the development of methods for submodular optimization, which have been motivated by the wide applicability of submodular functions in real-world data-science problems. In this paper, we…

Data Structures and Algorithms · Computer Science 2022-09-15 Guangyi Zhang , Nikolaj Tatti , Aristides Gionis

This paper considers a class of real-time stochastic optimization problems dependent on an unknown probability distribution. In the considered scenario, data is streaming frequently while trying to reach a decision. Thus, we aim to devise a…

Optimization and Control · Mathematics 2020-09-08 Dan Li , Sonia Martinez

Distributionally robust optimization is used to tackle decision making problems under uncertainty where the distribution of the uncertain data is ambiguous. Many ambiguity sets have been proposed for continuous uncertainty that build on…

Optimization and Control · Mathematics 2025-05-28 Karthik Natarajan , Divya Padmanabhan , Arjun Ramachandra

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data…

Machine Learning · Computer Science 2013-08-27 Cem Tekin , Mihaela van der Schaar

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further…

Methodology · Statistics 2026-04-28 Min Yang , Wei Zheng , John Stufken , Ming-Chung Chang , Ting Tian , Xueqin Wang
‹ Prev 1 2 3 10 Next ›