English
Related papers

Related papers: On Robust Aggregation for Distributed Data

200 papers

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing…

Statistics Theory · Mathematics 2017-04-06 Chengchun Shi , Wenbin Lu , Rui Song

This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different…

Statistics Theory · Mathematics 2018-05-30 Song Xi Chen , Liuhua Peng

Many modern datasets are collected automatically and are thus easily contaminated by outliers. This led to a regain of interest in robust estimation, including new notions of robustness such as robustness to adversarial contamination of the…

Statistics Theory · Mathematics 2023-05-05 Pierre Alquier , Mathieu Gerber

In this paper, we present a new approach of distributed clustering for spatial datasets, based on an innovative and efficient aggregation technique. This distributed approach consists of two phases: 1) local clustering phase, where each…

Databases · Computer Science 2018-02-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

In distributed, or privacy-preserving learning, we are often given a set of probabilistic models estimated from different local repositories, and asked to combine them into a single model that gives efficient statistical estimation. A…

Machine Learning · Statistics 2017-03-01 Jun Han , Qiang Liu

Distributed data naturally arise in scenarios involving multiple sources of observations, each stored at a different location. Directly pooling all the data together is often prohibited due to limited bandwidth and storage, or due to…

Methodology · Statistics 2021-07-07 Jiyu Luo , Qiang Sun , Wenxin Zhou

In this paper, we develop connections between two seemingly disparate, but central, models in robust statistics: Huber's epsilon-contamination model and the heavy-tailed noise model. We provide conditions under which this connection…

Machine Learning · Statistics 2019-07-03 Adarsh Prasad , Sivaraman Balakrishnan , Pradeep Ravikumar

In multicenter research, individual-level data are often protected against sharing across sites. To overcome the barrier of data sharing, many distributed algorithms, which only require sharing aggregated information, have been developed.…

Methodology · Statistics 2021-03-25 Rui Duan , Yang Ning , Yong Chen

Real-world network applications must cope with failing nodes, malicious attacks, or, somehow, nodes facing corrupted data --- classified as outliers. One enabling application is the geographic localization of the network nodes. However,…

Optimization and Control · Mathematics 2016-10-31 Cláudia Soares , João Gomes

Nowadays, huge amounts of data are naturally collected in distributed sites due to different facts and moving these data through the network for extracting useful knowledge is almost unfeasible for either technical reasons or policies.…

Databases · Computer Science 2017-03-30 Lamine M. Aouad , Nhien-An Le-Khac , Tahar Kechadi

Imputation and propensity score weighting are two popular techniques for handling missing data. We address these problems using the regularized M-estimation techniques in the reproducing kernel Hilbert space. Specifically, we first use the…

Methodology · Statistics 2021-07-16 Hengfang Wang , Jae Kwang Kim

Today's data pose unprecedented challenges to statisticians. It may be incomplete, corrupted or exposed to some unknown source of contamination. We need new methods and theories to grapple with these challenges. Robust estimation is one of…

Statistics Theory · Mathematics 2017-01-17 Mengjie Chen , Chao Gao , Zhao Ren

We address general-shaped clustering problems under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration. The algorithm has low computational complexity…

Methodology · Statistics 2022-01-19 Luca Insolia , Domenico Perrotta

Distributed aggregation allows the derivation of a given global aggregate property from many individual local values in nodes of an interconnected network system. Simple aggregates such as minima/maxima, counts, sums and averages have been…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-09 Miguel Borges , Paulo Jesus , Carlos Baquero , Paulo Sérgio Almeida

Efficient extraction of useful knowledge from these data is still a challenge, mainly when the data is distributed, heterogeneous and of different quality depending on its corresponding local infrastructure. To reduce the overhead cost,…

Databases · Computer Science 2017-04-17 Nhien-An Le-Khac , M-Tahar Kechadi

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…

Databases · Computer Science 2018-02-27 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Algorithmic robust statistics has traditionally focused on the contamination model where a small fraction of the samples are arbitrarily corrupted. We consider a recent contamination model that combines two kinds of corruptions: (i) small…

Data Structures and Algorithms · Computer Science 2024-10-23 Thanasis Pittas , Ankit Pensia

The proliferation of science and technology has led to the prevalence of voluminous data sets that are distributed across multiple machines. It is an established fact that conventional statistical methodologies may be unfeasible in the…

Statistics Theory · Mathematics 2023-10-24 Lu Yan , Jiang Hu

In this work, we propose a non-parametric and robust change detection algorithm to detect multiple change points in time series data under contamination. The contamination model is sufficiently general, in that, the most common model used…

Methodology · Statistics 2022-06-24 Sujay Bhatt , Guanhua Fang , Ping Li

An efficient method for obtaining low-density hyperplane separators in the unsupervised context is proposed. Low density separators can be used to obtain a partition of a set of data based on their allocations to the different sides of the…

Machine Learning · Statistics 2021-08-10 David P. Hofmeyr
‹ Prev 1 2 3 10 Next ›