English
Related papers

Related papers: Inner Loop Optimizations in Mapping Single Threade…

200 papers

Application Specific Instruction-set Processor (ASIP) is one of the popular processor design techniques for embedded systems which allows customizability in processor design without overly hindering design flexibility. Multi-pipeline ASIPs…

Programming Languages · Computer Science 2014-02-05 Rajitha Navarathna , Swarnalatha Radhakrishnan , Roshan Ragel

In this paper we review main ideas mentioned in several other papers which talk about optimization techniques used by compilers. Here we focus on loop unrolling technique and its effect on power consumption, energy usage and also its impact…

Programming Languages · Computer Science 2013-08-13 Meisam Booshehri , Abbas Malekpour , Peter Luksch

Over the past few years, there has been an increased interest in including FPGAs in data centers and high-performance computing clusters along with GPUs and other accelerators. As a result, it has become increasingly important to have a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-14 Mostafa Eghbali Zarch , Reece Neff , Michela Becchi

The rapidly increasing number of cores available in multicore processors does not necessarily lead directly to a commensurate increase in performance: programs written in conventional languages, such as C, need careful restructuring,…

Programming Languages · Computer Science 2015-01-28 Esraa Alwan , John Fitch , Julian Padget

This study investigates computationally efficient inner-loop algorithms for estimating static/dynamic BLP models. It provides the following ideas for reducing the number of inner-loop iterations: (1). Add a term relating to the outside…

Econometrics · Economics 2025-04-25 Takeshi Fukasawa

The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-30 Nanda K. Unnikrishnan , Keshab K. Parhi

OpenMP is the de facto API for parallel programming in HPC applications. These programs are often computed in data centers, where energy consumption is a major issue. Whereas previous work has focused almost entirely on performance, we here…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-12 Henrik Valter , Axel Karlsson , Miquel Pericàs

In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by outlining the current state of the art in loop scheduling and presenting evidence that the existing OpenMP schedules are insufficient for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-11 Florina M. Ciorba , Christian Iwainsky , Patrick Buder

We propose a method for performing software pipelining on quantum for-loop programs, exploiting parallelism in and across iterations. We redefine concepts that are useful in program optimization, including array aliasing, instruction…

Quantum Physics · Physics 2020-12-25 Jingzhe Guo , Mingsheng Ying

Applications' performance is influenced by the mapping of processes to computing nodes, the frequency and volume of exchanges among processing elements, the network capacity, and the routing protocol. A poor mapping of application processes…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-11 Jonas H. Müller Korndörfer , Mario Bielert , Laércio L. Pilla , Florina M. Ciorba

We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two…

Hardware Architecture · Computer Science 2023-05-30 Madhav P. Desai

Performance optimization is the art of continuous seeking a harmonious mapping between the application domain and hardware. Recent years have witnessed a surge of deep learning (DL) applications in industry. Conventional wisdom for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-27 Guoping Long , Jun Yang , Wei Lin

According to the increasing complexity of network application and internet traffic, network processor as a subset of embedded processors have to process more computation intensive tasks. By scaling down the feature size and emersion of chip…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Hojjat Taghdisi

In modern data centers, energy usage represents one of the major factors affecting operational costs. Power capping is a technique that limits the power consumption of individual systems, which allows reducing the overall power demand at…

Performance · Computer Science 2017-09-05 Stefano Conoci , Pierangelo Di Sanzo , Bruno Ciciani , Francesco Quaglia

Token generation speed is critical to power the next wave of AI inference applications. GPUs significantly underperform during token generation due to synchronization overheads at kernel boundaries, utilizing only 21% of their peak memory…

We introduce a mapping framework for deep learning inference that takes advantage of predictable neural network behavior to plan both computation and communication ahead of time. The framework generates a unified stream of instructions and…

Hardware Architecture · Computer Science 2025-09-05 Md Rownak Hossain Chowdhury , Mostafizur Rahman

Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-20 Marta Navarro , Josué Feliu , Salvador Petit , María E. Gómez , Julio Sahuquillo

An interior-point algorithm framework is proposed, analyzed, and tested for solving nonlinearly constrained continuous optimization problems. The main setting of interest is when the objective and constraint functions may be nonlinear…

Optimization and Control · Mathematics 2024-08-30 Frank E. Curtis , Xin Jiang , Qi Wang

Optimizing programs requires deep expertise. On one hand, it is a tedious task, because it requires a lot of tests to find out the best combination of optimizations to apply with their best factors. On the other hand, this task is critical,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-12 Asma Balamane , Zina Taklit

The multi-pumping resource sharing technique can overcome the limitations commonly found in single-clocked FPGA designs by allowing hardware components to operate at a higher clock frequency than the surrounding system. However, this…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-11 Carl-Johannes Johnsen , Tiziano De Matteis , Tal Ben-Nun , Johannes de Fine Licht , Torsten Hoefler
‹ Prev 1 2 3 10 Next ›