English
Related papers

Related papers: Instructional Level Parallelism

200 papers

With the increasing capabilities of Large Language Models (LLMs), parallel reasoning has emerged as a new inference paradigm that enhances reasoning robustness by concurrently exploring multiple lines of thought before converging on a final…

Computation and Language · Computer Science 2025-10-15 Ziqi Wang , Boye Niu , Zipeng Gao , Zhi Zheng , Tong Xu , Linghui Meng , Zhongli Li , Jing Liu , Yilong Chen , Chen Zhu , Hua Wu , Haifeng Wang , Enhong Chen

This paper is focused on evaluating the effect of some different techniques in machine learning speed-up, including vector caches, parallel execution, and so on. The following content will include some review of the previous approaches and…

Machine Learning · Computer Science 2021-01-12 Zeyu Ning , Hugues Nelson Iradukunda , Qingquan Zhang , Ting Zhu

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

It has been shown that a class of probabilistic domain models cannot be learned correctly by several existing algorithms which employ a single-link look ahead search. When a multi-link look ahead search is used, the computational complexity…

Artificial Intelligence · Computer Science 2013-02-08 TongSheng Chu , Yang Xiang

Neural networks have become a cornerstone of machine learning. As the trend for these to get more and more complex continues, so does the underlying hardware and software infrastructure for training and deployment. In this survey we answer…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-07 Felix Brakel , Uraz Odyurt , Ana-Lucia Varbanescu

In this work we study parallelization of online learning, a core primitive in machine learning. In a parallel environment all known approaches for parallel online learning lead to delayed updates, where the model is updated using…

Machine Learning · Computer Science 2011-03-23 Daniel Hsu , Nikos Karampatziakis , John Langford , Alex Smola

Running parallel applications requires special and expensive processing resources to obtain the required results within a reasonable time. Before parallelizing serial applications, some analysis is recommended to be carried out to decide…

Software Engineering · Computer Science 2011-03-30 Alaa Ismail Elnashar

To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-23 Ziyue Luo , Xiaodong Yi , Guoping Long , Shiqing Fan , Chuan Wu , Jun Yang , Wei Lin

Currently, training large-scale deep learning models is typically achieved through parallel training across multiple GPUs. However, due to the inherent communication overhead and synchronization delays in traditional model parallelism…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Xiuyuan Guo , Chengqi Xu , Guinan Guo , Feiyu Zhu , Changpeng Cai , Peizhe Wang , Xiaoming Wei , Junhao Su , Jialin Gao

We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles. Leveraging operation pipelining and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-06 Joao Carreira , Viorica Patraucean , Laurent Mazare , Andrew Zisserman , Simon Osindero

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

We compare different methods for the computation of the real dilogarithm regarding their ability for using instruction-level parallelism when executed on appropriate CPUs. As a result we present an instruction-level-aware method and compare…

High Energy Physics - Phenomenology · Physics 2022-01-06 Alexander Voigt

The aim of boosting is to convert a sequence of weak learners into a strong learner. At their heart, these methods are fully sequential. In this paper, we investigate the possibility of parallelizing boosting. Our main contribution is a…

Machine Learning · Computer Science 2023-08-22 Amin Karbasi , Kasper Green Larsen

We report on an experimental investigation into opportunities for parallelism in beliefnet inference. Specifically, we report on a study performed of the available parallelism, on hypercube style machines, of a set of randomly generated…

Artificial Intelligence · Computer Science 2013-03-25 Bruce D'Ambrosio , Tony Fountain , Zhaoyu Li

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for…

Programming Languages · Computer Science 2018-07-05 Vladimir Kiriansky , Haoran Xu , Martin Rinard , Saman Amarasinghe

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-19 Julian M. Kunkel , Eugen Betke , Matt Bryson , Philip Carns , Rosemary Francis , Wolfgang Frings , Roland Laifer , Sandra Mendez

With the increasing scale of models, the need for efficient distributed training has become increasingly urgent. Recently, many synchronous pipeline parallelism approaches have been proposed to improve training throughput. However, these…

Machine Learning · Computer Science 2024-10-28 Houming Wu , Ling Chen , Wenjie Yu

Pipeline parallelism is one of the key components for large-scale distributed training, yet its efficiency suffers from pipeline bubbles which were deemed inevitable. In this work, we introduce a scheduling strategy that, to our knowledge,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-22 Penghui Qi , Xinyi Wan , Guangxing Huang , Min Lin
‹ Prev 1 2 3 10 Next ›