English
Related papers

Related papers: Distributed Parameter Map-Reduce

200 papers

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck.…

Statistics Theory · Mathematics 2022-10-25 Edgar Dobriban , Yue Sheng

Stochastic variance reduced methods have gained a lot of interest recently for empirical risk minimization due to its appealing run time complexity. When the data size is large and disjointly stored on different machines, it becomes…

Machine Learning · Computer Science 2020-08-26 Shicong Cen , Huishuai Zhang , Yuejie Chi , Wei Chen , Tie-Yan Liu

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…

Machine Learning · Computer Science 2012-07-03 Qiang Liu , Alexander Ihler

Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-03-26 Shah Asaduzzaman , Muthucumaru Maheswaran

In many large-scale machine learning applications, data are accumulated with time, and thus, an appropriate model should be able to update in an online paradigm. Moreover, as the whole data volume is unknown when constructing the model, it…

Machine Learning · Computer Science 2020-07-07 Peng Zhao , Zhi-Hua Zhou

Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to improve the generalization performance for each task. Traditionally, to perform multi-task learning, one needs to centralize data from all the tasks…

Machine Learning · Computer Science 2017-06-21 Sulin Liu , Sinno Jialin Pan , Qirong Ho

Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are…

Machine Learning · Computer Science 2015-03-18 Dhruv Mahajan , Nikunj Agrawal , S. Sathiya Keerthi , S. Sundararajan , Leon Bottou

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from…

Methodology · Statistics 2020-08-25 Tom Whitaker , Boris Beranger , Scott A. Sisson

In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and…

Numerical Analysis · Mathematics 2020-09-25 Eric Chung , Wing Tat Leung , Sai-Mang Pun , Zecheng Zhang

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-08 Claudio Reggiani , Yann-Aël Le Borgne , Gianluca Bontempi

In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…

Machine Learning · Statistics 2024-04-02 Agniva Chowdhury , Pradeep Ramuhalli

Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is…

Machine Learning · Computer Science 2017-01-03 Bo Dai , Niao He , Yunpeng Pan , Byron Boots , Le Song

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

We initiate the study of deterministic distributed graph algorithms with predictions in synchronous message passing systems. The process at each node in the graph is given a prediction, which is some extra information about the problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-01 Joan Boyar , Faith Ellen , Kim S. Larsen

Large-scale rare events data are commonly encountered in practice. To tackle the massive rare events data, we propose a novel distributed estimation method for logistic regression in a distributed system. For a distributed framework, we…

Methodology · Statistics 2023-04-06 Xuetong Li , Xuening Zhu , Hansheng Wang

The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-29 Ashish Goel , Kamesh Munagala

To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in…

Machine Learning · Computer Science 2020-07-06 Alexander Renz-Wieland , Rainer Gemulla , Steffen Zeuch , Volker Markl
‹ Prev 1 2 3 10 Next ›