Related papers: Large-Scale Clustering Based on Data Compression
Dantzig-Wolfe decomposition (DWD) is a classical algorithm for solving large-scale linear programs whose constraint matrix involves a set of independent blocks coupled with a set of linking rows. The algorithm decomposes such a model into a…
The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a…
We address a large-scale and nonconvex optimization problem, involving an aggregative term. This term can be interpreted as the sum of the contributions of N agents to some common good, with N large. We investigate a relaxation of this…
Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…
Dantzig-Wolfe (DW) decomposition is a well-known technique in mixed-integer programming (MIP) for decomposing and convexifying constraints to obtain potentially strong dual bounds. We investigate cutting planes that can be derived using the…
This paper discusses a deterministic clustering approach to capacitated resource allocation problems. In particular, the Deterministic Annealing (DA) algorithm from the data-compression literature, which bears a distinct analogy to the…
The unit commitment problem is a short-term planning problem in the energy industry. Dantzig-Wolfe decomposition is a popular approach to solve the problem. This paper focuses on primal heuristics used with Dantzig-Wolfe decomposition. We…
We propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning, where we start the algorithm by aggregating the original data, solving the problem on aggregated data, and then in subsequent…
In random allocation rules, typically first an optimal fractional point is calculated via solving a linear program. The calculated point represents a fractional assignment of objects or more generally packages of objects to agents. In order…
Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the…
We propose an algorithm to solve quasi-variational inequality problems, based on the Dantzig-Wolfe decomposition paradigm. Our approach solves in the subproblems variational inequalities, which is a simpler problem, while restricting…
Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…
Clustering, or grouping, dataset elements based on similarity can be used not only to classify a dataset into a few categories, but also to approximate it by a relatively large number of representative elements. In the latter scenario,…
This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and…
This paper is about how to partition decision variables while decomposing a large-scale optimization problem for the best performance of distributed solution methods. Solving a large-scale optimization problem sequen- tially can be…
Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…
State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points. By enforcing such representation to be sparse, sparse subspace clustering is…
We address the problem of un-supervised soft-clustering called micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, while standard clustering methods separate records at…
Currently, data-driven discovery in biological sciences resides in finding segmentation strategies in multivariate data that produce sensible descriptions of the data. Clustering is but one of several approaches and sometimes falls short…
In this paper, we propose a data based transformation for infinite-dimensional Gaussian processes and derive its limit theorem. For a classification problem, this transformation induces complete separation among the associated Gaussian…