English
Related papers

Related papers: A Soft SIMD Based Energy Efficient Computing Micro…

200 papers

The ever-increasing quest for data-level parallelism and variable precision in ubiquitous multimedia and Deep Neural Network (DNN) applications has motivated the use of Single Instruction, Multiple Data (SIMD) architectures. To alleviate…

Hardware Architecture · Computer Science 2020-11-03 Zahra Ebrahimi , Salim Ullah , Akash Kumar

In-memory computing (IMC) with single instruction multiple data (SIMD) setup enables memory to perform operations on the stored data in parallel to achieve high throughput and energy saving. To instruct a SIMD IMC hardware to compute a…

Emerging Technologies · Computer Science 2024-12-04 Xingyue Qian , Chenyang Lv , Zhezhi He , Weikang Qian

Processing-in-memory (PIM) is a promising computing paradigm to tackle the "memory wall" challenge. However, PIM system-level benefits over traditional von Neumann architecture can be reduced when the memory array cannot fully store all the…

Hardware Architecture · Computer Science 2025-03-03 Peilin Chen , Xiaoxuan Yang

We describe a modified SIMD architecture suitable for single-chip integration of a large number of processing elements, such as 1,000 or more. Important differences from traditional SIMD designs are: a) The size of the memory per processing…

Astrophysics · Physics 2007-05-23 Junichiro Makino

Many quantum computing platforms are based on a two-dimensional physical layout. Here we explore a concept called looped pipelines which permits one to obtain many of the advantages of a 3D lattice while operating a strictly 2D device. The…

Quantum Physics · Physics 2025-01-23 Zhenyu Cai , Adam Siegel , Simon Benjamin

Computing-in-memory (CIM) has attracted significant attentions in recent years due to its massive parallelism and low power consumption. However, current CIM designs suffer from large area overhead of small CIM macros and bad programmablity…

Hardware Architecture · Computer Science 2022-05-04 Shu-Hung Kuo , Tian-Sheuan Chang

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex…

Both IP lookup and packet classification in IP routers can be implemented by some form of tree traversal. SRAM-based Pipelining can improve the throughput dramatically. However, previous pipelining schemes result in unbalanced memory…

Networking and Internet Architecture · Computer Science 2011-07-28 Weirong Jiang , Hoang Le , Viktor K. Prasanna

Calculating interactions or correlations between pairs of particles is typically the most time-consuming task in particle simulation or correlation analysis. Straightforward implementations using a double loop over particle pairs have…

Computational Physics · Physics 2015-06-16 Szilárd Páll , Berk Hess

Stacked intelligent metasurfaces (SIMs), which integrate multiple programmable metasurface layers, have recently emerged as a promising technology for advanced wave-domain signal processing. SIMs benefit from flexible spatial…

Recent advancements in quantization and mixed-precision approaches offers substantial opportunities to improve the speed and energy efficiency of Neural Networks (NN). Research has shown that individual parameters with varying low…

Hardware Architecture · Computer Science 2024-08-14 Giorgos Armeniakos , Alexis Maras , Sotirios Xydis , Dimitrios Soudris

Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-27 Subhrajit Das , Abhishek Bichhawat , Yuvraj Patel

Hardware/Software (HW/SW) co-designed processors provide a promising solution to the power and complexity problems of the modern microprocessors by keeping their hardware simple. Moreover, they employ several runtime optimizations to…

Hardware Architecture · Computer Science 2021-03-01 Rakesh Kumar , Alejandro Martinez , Antonio Gonzalez

Pipeline Parallelism (PP) serves as a crucial technique for training Large Language Models (LLMs), owing to its capability to alleviate memory pressure from model states with relatively low communication overhead. However, in long-context…

Machine Learning · Computer Science 2025-04-22 Zhouyang Li , Yuliang Liu , Wei Zhang , Tailing Yuan , Bin Chen , Chengru Song , Di Zhang

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable the full adoption of processing-using-DRAM, it is necessary to provide support for more complex…

Fast combinational multipliers with large bit widths can occupy significant silicon area, which also drives up power consumption. Area can be reduced through resource sharing (i.e., folding) at the expense of lower throughput, which is…

Hardware Architecture · Computer Science 2025-09-03 Ahmad Houraniah , H. Fatih Ugurdag , C. Emre Dedeagac

This paper proposes a mobile pipeline computing concept in a Device-to-Device (D2D) communication setup and studies related issues, where D2D is likely based on millimeter-wave (mmWave) in the 5G mobile communication. The proposed…

Networking and Internet Architecture · Computer Science 2020-06-17 Terry N. Guo , Hawzhin Mohammed , Syed R. Hasan

Current Artificial Intelligence (AI) computation systems face challenges, primarily from the memory-wall issue, limiting overall system-level performance, especially for Edge devices with constrained battery budgets, such as smartphones,…

Hardware Architecture · Computer Science 2024-10-15 Lucas Huijbregts , Liu Hsiao-Hsuan , Paul Detterer , Said Hamdioui , Amirreza Yousefzadeh , Rajendra Bishnoi

Software Defined Vehicles face an increasing computational gap as advanced algorithms and frequent software updates demand more processing power while onboard hardware remains static throughout a vehicle's 10+ year lifespan. This mismatch…

Software Engineering · Computer Science 2026-04-30 Falk Dettinger , Matthias Weiß , Baran Can Gül , Sruthi Mangala Suresh , Nasser Jazdi , Michael Weyrich

Large symmetric eigenvalue problems are commonly observed in many disciplines such as Chemistry and Physics, and several libraries including cuSOLVERMp, MAGMA and ELPA support computing large eigenvalue decomposition on multi-GPU or…

Mathematical Software · Computer Science 2025-11-21 Hansheng Wang , Ruiyi Zhan , Dajun Huang , Xingchen Liu , Qiao Li , Hancong Duan , Dingwen Tao , Guangming Tan , Shaoshuai Zhang
‹ Prev 1 2 3 10 Next ›