Related papers: Distributed Parameter Map-Reduce

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Distributed linear regression by averaging

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck.…

Statistics Theory · Mathematics 2022-10-25 Edgar Dobriban , Yue Sheng

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

Stochastic variance reduced methods have gained a lot of interest recently for empirical risk minimization due to its appealing run time complexity. When the data size is large and disjointly stored on different machines, it becomes…

Machine Learning · Computer Science 2020-08-26 Shicong Cen , Huishuai Zhang , Yuejie Chi , Wei Chen , Tie-Yan Liu

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Distributed Parameter Estimation via Pseudo-likelihood

Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…

Machine Learning · Computer Science 2012-07-03 Qiang Liu , Alexander Ihler

Towards a decentralized algorithm for mapping network and computational resources for distributed data-flow computations

Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-03-26 Shah Asaduzzaman , Muthucumaru Maheswaran

Distribution-Free One-Pass Learning

In many large-scale machine learning applications, data are accumulated with time, and thus, an appropriate model should be able to update in an online paradigm. Moreover, as the whole data volume is unknown when constructing the model, it…

Machine Learning · Computer Science 2020-07-07 Peng Zhao , Zhi-Hua Zhou

Distributed Multi-Task Relationship Learning

Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to improve the generalization performance for each task. Traditionally, to perform multi-task learning, one needs to centralize data from all the tasks…

Machine Learning · Computer Science 2017-06-21 Sulin Liu , Sinno Jialin Pan , Qirong Ho

An efficient distributed learning algorithm based on effective local functional approximations

Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are…

Machine Learning · Computer Science 2015-03-18 Dhruv Mahajan , Nikunj Agrawal , S. Sathiya Keerthi , S. Sundararajan , Leon Bottou

Optimizing MapReduce for Highly Distributed Environments

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

Logistic regression models for aggregated data

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from…

Methodology · Statistics 2020-08-25 Tom Whitaker , Boris Beranger , Scott A. Sisson

A multi-stage deep learning based algorithm for multiscale modelreduction

In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…

Numerical Analysis · Mathematics 2020-09-25 Eric Chung , Wing Tat Leung , Sai-Mang Pun , Zecheng Zhang

Feature selection in high-dimensional dataset using MapReduce

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-08 Claudio Reggiani , Yann-Aël Le Borgne , Gianluca Bontempi

A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…

Machine Learning · Statistics 2024-04-02 Agniva Chowdhury , Pradeep Ramuhalli

Learning from Conditional Distributions via Dual Embeddings

Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is…

Machine Learning · Computer Science 2017-01-03 Bo Dai , Niao He , Yunpeng Pan , Byron Boots , Le Song

MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

Distributed Graph Algorithms with Predictions

We initiate the study of deterministic distributed graph algorithms with predictions in synchronous message passing systems. The process at each node in the graph is given a prediction, which is some extra information about the problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-01 Joan Boyar , Faith Ellen , Kim S. Larsen

Distributed Logistic Regression for Massive Data with Rare Events

Large-scale rare events data are commonly encountered in practice. To tackle the massive rare events data, we propose a novel distributed estimation method for logistic regression in a distributed system. For a distributed framework, we…

Methodology · Statistics 2023-04-06 Xuetong Li , Xuening Zhu , Hansheng Wang

Complexity Measures for Map-Reduce, and Comparison to Parallel Computing

The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-29 Ashish Goel , Kamesh Munagala

Dynamic Parameter Allocation in Parameter Servers

To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in…

Machine Learning · Computer Science 2020-07-06 Alexander Renz-Wieland , Rainer Gemulla , Steffen Zeuch , Volker Markl