Related papers: Optimal Data Selection: An Online Distributed View
A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization…
There are many problems in machine learning and data mining which are equivalent to selecting a non-redundant, high "quality" set of objects. Recommender systems, feature selection, and data summarization are among many applications of…
Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize value of information? These applications exhibit strong diminishing returns: Selection of…
Constrained submodular function maximization has been used in subset selection problems such as selection of most informative sensor locations. While these models have been quite popular, the solutions Constrained submodular function…
This paper considers a class of real-time decision making problems to minimize the expected value of a function that depends on a random variable $\xi$ under an unknown distribution $\mathbb{P}$. In this process, samples of $\xi$ are…
Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is…
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data…
Constrained $k$-submodular maximization is a general framework that captures many discrete optimization problems such as ad allocation, influence maximization, personalized recommendation, and many others. In many of these applications,…
Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation. We cast this problem as an instance of a deletion-robust submodular…
Given a sequence of independent random variables with a common continuous distribution, we consider the online decision problem where one seeks to minimize the expected value of the time that is needed to complete the selection of a…
We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…
Submodular function maximization has been studied extensively in recent years under various constraints and models. The problem plays a major role in various disciplines. We study a natural online variant of this problem in which elements…
Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize the value of the ranking? These applications exhibit strong diminishing returns: Redundancy…
We consider a basic problem at the interface of two fundamental fields: submodular optimization and online learning. In the online unconstrained submodular maximization (online USM) problem, there is a universe $[n]=\{1,2,...,n\}$ and a…
A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very…
In recent years we have witnessed an increase on the development of methods for submodular optimization, which have been motivated by the wide applicability of submodular functions in real-world data-science problems. In this paper, we…
This paper considers a class of real-time stochastic optimization problems dependent on an unknown probability distribution. In the considered scenario, data is streaming frequently while trying to reach a decision. Thus, we aim to devise a…
Distributionally robust optimization is used to tackle decision making problems under uncertainty where the distribution of the uncertain data is ambiguous. Many ambiguity sets have been proposed for continuous uncertainty that build on…
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data…
When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further…