Related papers: Instructional Level Parallelism

A Survey on Parallel Reasoning

With the increasing capabilities of Large Language Models (LLMs), parallel reasoning has emerged as a new inference paradigm that enhances reasoning robustness by concurrently exploring multiple lines of thought before converging on a final…

Computation and Language · Computer Science 2025-10-15 Ziqi Wang , Boye Niu , Zipeng Gao , Zhi Zheng , Tong Xu , Linghui Meng , Zhongli Li , Jing Liu , Yilong Chen , Chen Zhu , Hua Wu , Haifeng Wang , Enhong Chen

Benchmarking Machine Learning: How Fast Can Your Algorithms Go?

This paper is focused on evaluating the effect of some different techniques in machine learning speed-up, including vector caches, parallel execution, and so on. The following content will include some review of the previous approaches and…

Machine Learning · Computer Science 2021-01-12 Zeyu Ning , Hugues Nelson Iradukunda , Qingquan Zhang , Ting Zhu

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Exploring Parallelism in Learning Belief Networks

It has been shown that a class of probabilistic domain models cannot be learned correctly by several existing algorithms which employ a single-link look ahead search. When a multi-link look ahead search is used, the computational complexity…

Artificial Intelligence · Computer Science 2013-02-08 TongSheng Chu , Yang Xiang

Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies

Neural networks have become a cornerstone of machine learning. As the trend for these to get more and more complex continues, so does the underlying hardware and software infrastructure for training and deployment. In this survey we answer…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-07 Felix Brakel , Uraz Odyurt , Ana-Lucia Varbanescu

Parallel Online Learning

In this work we study parallelization of online learning, a core primitive in machine learning. In a parallel environment all known approaches for parallel online learning lead to delayed updates, where the model is updated using…

Machine Learning · Computer Science 2011-03-23 Daniel Hsu , Nikos Karampatziakis , John Langford , Alex Smola

To Parallelize or Not to Parallelize, Speed Up Issue

Running parallel applications requires special and expensive processing resources to obtain the required results within a reasonable time. Before parallelizing serial applications, some analysis is recommended to be carried out to decide…

Software Engineering · Computer Science 2011-03-30 Alaa Ismail Elnashar

Efficient Pipeline Planning for Expedited Distributed DNN Training

To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-23 Ziyue Luo , Xiaodong Yi , Guoping Long , Shiqing Fan , Chuan Wu , Jun Yang , Wei Lin

Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning

Currently, training large-scale deep learning models is typically achieved through parallel training across multiple GPUs. However, due to the inherent communication overhead and synchronization delays in traditional model parallelism…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Xiuyuan Guo , Chengqi Xu , Guinan Guo , Feiyu Zhu , Changpeng Cai , Peizhe Wang , Xiaoming Wei , Junhao Su , Jialin Gao

Massively Parallel Video Networks

We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-06 Joao Carreira , Viorica Patraucean , Laurent Mazare , Andrew Zisserman , Simon Osindero

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

Comparison of methods for the calculation of the real dilogarithm regarding instruction-level parallelism

We compare different methods for the computation of the real dilogarithm regarding their ability for using instruction-level parallelism when executed on appropriate CPUs. As a result we present an instruction-level-aware method and compare…

High Energy Physics - Phenomenology · Physics 2022-01-06 Alexander Voigt

The Impossibility of Parallelizing Boosting

The aim of boosting is to convert a sequence of weak learners into a strong learner. At their heart, these methods are fully sequential. In this paper, we investigate the possibility of parallelizing boosting. Our main contribution is a…

Machine Learning · Computer Science 2023-08-22 Amin Karbasi , Kasper Green Larsen

Parallelizing Probabilistic Inference: Some Early Explorations

We report on an experimental investigation into opportunities for parallelism in beliefnet inference. Specifically, we report on a study performed of the available parallelism, on hypercube style machines, of a set of randomly generated…

Artificial Intelligence · Computer Science 2013-03-25 Bruce D'Ambrosio , Tony Fountain , Zhaoyu Li

Cimple: Instruction and Memory Level Parallelism

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for…

Programming Languages · Computer Science 2018-07-05 Vladimir Kiriansky , Haoran Xu , Martin Rinard , Saman Amarasinghe

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Tools for Analyzing Parallel I/O

Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-19 Julian M. Kunkel , Eugen Betke , Matt Bryson , Philip Carns , Rosemary Francis , Wolfgang Frings , Roland Laifer , Sandra Mendez

BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training

With the increasing scale of models, the need for efficient distributed training has become increasingly urgent. Recently, many synchronous pipeline parallelism approaches have been proposed to improve training throughput. However, these…

Machine Learning · Computer Science 2024-10-28 Houming Wu , Ling Chen , Wenjie Yu

Zero Bubble Pipeline Parallelism

Pipeline parallelism is one of the key components for large-scale distributed training, yet its efficiency suffers from pipeline bubbles which were deemed inevitable. In this work, we introduce a scheduling strategy that, to our knowledge,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-22 Penghui Qi , Xinyi Wan , Guangxing Huang , Min Lin