Related papers: Probabilistic Re-aggregation Algorithm [First Draf…

Bayesian Spatial Field Reconstruction with Unknown Distortions in Sensor Networks

Spatial regression of random fields based on potentially biased sensing information is proposed in this paper. One major concern in such applications is that since it is not known a-priori what the accuracy of the collected data from each…

Signal Processing · Electrical Eng. & Systems 2020-09-04 Qikun Xiang , Ido Nevat , Gareth W. Peters

Scalable Algorithms for Aggregating Disparate Forecasts of Probability

In this paper, computational aspects of the panel aggregation problem are addressed. Motivated primarily by applications of risk assessment, an algorithm is developed for aggregating large corpora of internally incoherent probability…

Artificial Intelligence · Computer Science 2007-07-13 Joel B. Predd , Sanjeev R. Kulkarni , Daniel N. Osherson , H. Vincent Poor

A Survey of Distributed Data Aggregation Algorithms

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-10-05 Paulo Jesus , Carlos Baquero , Paulo Sérgio Almeida

Collaborative Prediction: To Join or To Disjoin Datasets

With the recent rise of generative Artificial Intelligence (AI), the need of selecting high-quality dataset to improve machine learning models has garnered increasing attention. However, some part of this topic remains underexplored, even…

Machine Learning · Statistics 2025-06-16 Kyung Rok Kim , Yansong Wang , Xiaocheng Li , Guanting Chen

Learning of Optimal Forecast Aggregation in Partial Evidence Environments

We consider the forecast aggregation problem in repeated settings, where the forecasts are done on a binary event. At each period multiple experts provide forecasts about an event. The goal of the aggregator is to aggregate those forecasts…

Machine Learning · Computer Science 2018-02-21 Yakov Babichenko , Dan Garber

An Aggregate and Iterative Disaggregate Algorithm with Proven Optimality in Machine Learning

We propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning, where we start the algorithm by aggregating the original data, solving the problem on aggregated data, and then in subsequent…

Machine Learning · Statistics 2017-01-23 Young Woong Park , Diego Klabjan

New reconstruction and data processing methods for regression and interpolation analysis of multidimensional big data

The problems of computational data processing involving regression, interpolation, reconstruction and imputation for multidimensional big datasets are becoming more important these days, because of the availability of data and their widely…

Methodology · Statistics 2017-03-22 Yuri K. Shestopaloff , Alexander Y. Shestopaloff

Neural network ensembles: Evaluation of aggregation algorithms

Ensembles of artificial neural networks show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. An…

Artificial Intelligence · Computer Science 2007-05-23 P. M. Granitto , P. F. Verdes , H. A. Ceccatto

Distributed, scalable and gossip-free consensus optimization with application to data analysis

Distributed algorithms for solving additive or consensus optimization problems commonly rely on first-order or proximal splitting methods. These algorithms generally come with restrictive assumptions and at best enjoy a linear convergence…

Optimization and Control · Mathematics 2017-05-11 Sina Khoshfetrat Pakazad , Christian A. Naesseth , Fredrik Lindsten , Anders Hansson

Faster optimal univariate microgaggregation

Microaggregation is a method to coarsen a dataset, by optimally clustering data points in groups of at least $k$ points, thereby providing a $k$-anonymity type disclosure guarantee for each point in the dataset. Previous algorithms for…

Data Structures and Algorithms · Computer Science 2024-01-05 Felix I. Stamm , Michael T. Schaub

Distributed clustering in partially overlapping feature spaces

We introduce and address a novel distributed clustering problem where each participant has a private dataset containing only a subset of all available features, and some features are included in multiple datasets. This scenario occurs in…

Data Structures and Algorithms · Computer Science 2025-10-14 Alessio Maritan , Luca Schenato

A formalization of re-identification in terms of compatible probabilities

Re-identification algorithms are used in data privacy to measure disclosure risk. They model the situation in which an adversary attacks a published database by means of linking the information of this adversary with the database. In this…

Cryptography and Security · Computer Science 2013-01-23 Vicenç Torra , Klara Stokes

Greedy Subspace Clustering

We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses…

Machine Learning · Statistics 2014-11-03 Dohyung Park , Constantine Caramanis , Sujay Sanghavi

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts. The state-of-the-art aggregation models,…

Machine Learning · Statistics 2018-06-05 Haitao Liu , Jianfei Cai , Yi Wang , Yew-Soon Ong

Sample-and-Accumulate Algorithms for Belief Updating in Bayes Networks

Belief updating in Bayes nets, a well known computationally hard problem, has recently been approximated by several deterministic algorithms, and by various randomized approximation algorithms. Deterministic algorithms usually provide…

Artificial Intelligence · Computer Science 2013-02-18 Eugene Santos , Solomon Eyal Shimony , Edward Williams

Subspace Segmentation by Successive Approximations: A Method for Low-Rank and High-Rank Data with Missing Entries

We propose a method to reconstruct and cluster incomplete high-dimensional data lying in a union of low-dimensional subspaces. Exploring the sparse representation model, we jointly estimate the missing data while imposing the intrinsic…

Computer Vision and Pattern Recognition · Computer Science 2017-09-06 João Carvalho , Manuel Marques , João P. Costeira

Bicriteria Polygon Aggregation with Arbitrary Shapes

We study the problem of aggregating polygons by covering them with disjoint representative regions, thereby inducing a clustering of the polygons. Our objective is to minimize a weighted sum of the total area and the total perimeter of the…

Computational Geometry · Computer Science 2025-07-17 Lotte Blank , David Eppstein , Jan-Henrik Haunert , Herman Haverkort , Benedikt Kolbe , Philip Mayer , Petra Mutzel , Alexander Naumann , Jonas Sauer

Spatial Aggregation with Respect to a Population Distribution

Spatial aggregation with respect to a population distribution involves estimating aggregate quantities for a population based on an observation of individuals in a subpopulation. In this context, a geostatistical workflow must account for…

Methodology · Statistics 2022-07-15 John Paige , Geir-Arne Fuglstad , Andrea Riebler , Jon Wakefield

Recovering individual-level spatial inference from aggregated binary data

Binary regression models are commonly used in disciplines such as epidemiology and ecology to determine how spatial covariates influence individuals. In many studies, binary data are shared in a spatially aggregated form to protect privacy.…

Methodology · Statistics 2021-05-10 Nelson B. Walker , Trevor J. Hefley , Anne E. Ballmann , Robin E. Russell , Daniel P. Walsh

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…

Databases · Computer Science 2020-03-11 Mujahid Sultan