Related papers: Optimizing Bivariate Partial Information Decomposi…
Makkeh, Theis, and Vicente found in [8] that Cone Programming model is the most robust to compute the Bertschinger et al. partial information decompostion (BROJA PID) measure [1]. We developed a production-quality robust software that…
Bivariate partial information decompositions (PIDs) characterize how the information in a "message" random variable is decomposed between two "constituent" random variables in terms of unique, redundant and synergistic information…
We obtain a new lower bound on the information-based complexity of first-order minimization of smooth and convex functions. We show that the bound matches the worst-case performance of the recently introduced Optimized Gradient Method,…
Numerous modern optimization and machine learning algorithms rely on subgradient information being trustworthy and hence, they may fail to converge when such information is corrupted. In this paper, we consider the setting where subgradient…
We present a subgradient method for minimizing non-smooth, non-Lipschitz convex optimization problems. The only structure assumed is that a strictly feasible point is known. We extend the work of Renegar [5] by taking a different…
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect…
Bilevel optimization is a hierarchical framework where an upper-level optimization problem is constrained by a lower-level problem, commonly used in machine learning applications such as hyperparameter optimization. Existing bilevel…
In many real world problems, optimization decisions have to be made with limited information. The decision maker may have no a priori or posteriori data about the often nonconvex objective function except from on a limited number of points…
This paper develops two parameter-free methods for solving convex and strongly convex hybrid composite optimization problems, namely, a composite subgradient type method and a proximal bundle type method. Functional complexity bounds for…
We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has…
Motivated by learning problems including max-norm regularized matrix completion and clustering, robust PCA and sparse inverse covariance selection, we propose a novel optimization algorithm for minimizing a convex objective which decomposes…
Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over…
In this note we propose a new variant of the hybrid variance-reduced proximal gradient method in [7] to solve a common stochastic composite nonconvex optimization problem under standard assumptions. We simply replace the independent…
Polynomial optimization problems represent a wide class of optimization problems, with a large number of real-world applications. Current approaches for polynomial optimization, such as the sum of squares (SOS) method, rely on large-scale…
We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i.e. biased error terms. We derive convergence results for smooth (non-convex) functions and give improved rates…
Multivariate information decompositions hold promise to yield insight into complex systems, and stand out for their ability to identify synergistic phenomena. However, the adoption of these approaches has been hindered by there being…
In information theory, some optimization problems result in convex optimization problems on strictly convex functionals of probability densities. In this note, we study these problems and show conditions of minimizers and the uniqueness of…
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes of a network. For this problem, lower bounds on the number of gradient computations and the number of communication…
We revisit first-order optimization under local information constraints such as local privacy, gradient quantization, and computational constraints limiting access to a few coordinates of the gradient. In this setting, the optimization…
This paper presents a novel stochastic optimisation methodology to perform empirical Bayesian inference in semi-blind image deconvolution problems. Given a blurred image and a parametric class of possible operators, the proposed…