English
Related papers

Related papers: HyperParallel: A Supernode-Affinity AI Framework

200 papers

The rapid evolution of Large Language Models (LLMs) towards long-context reasoning and sparse architectures has pushed memory requirements far beyond the capacity of individual device HBM. While emerging supernode architectures offer…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-04 Fangxin Liu , Qinghua Zhang , Hanjing Shen , Zhibo Liang , Li Jiang , Haibing Guan , Chong Bao , Xuefeng Jin

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel…

Neural and Evolutionary Computing · Computer Science 2023-11-09 Jan Finkbeiner , Thomas Gmeinder , Mark Pupilli , Alexander Titterton , Emre Neftci

Modern Mixture-of-Experts (MoE) models increasingly rely on large-scale AI accelerator clusters for efficient training. Ascend NPUs expose heterogeneous on-chip compute resources, including matrix-oriented AIC units and vector-oriented AIV…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-25 Zewen Jin , Congkun Ai , Guangpeng Zhang , Hanbo Zhang , Haoran Wang , Shihan Xiao , Da Lei , Xuefeng Jin , Teng Su , Cheng Li

Brain-inspired computing aims to mimic cognitive functions like associative memory, the ability to recall complete patterns from partial cues. Memristor technology offers promising hardware for such neuromorphic systems due to its potential…

Machine Learning · Computer Science 2025-05-20 Chengping He , Mingrui Jiang , Keyi Shan , Szu-Hao Yang , Zefan Li , Shengbo Wang , Giacomo Pedretti , Jim Ignowski , Can Li

New types of machine learning hardware in development and entering the market hold the promise of revolutionizing deep learning in a manner as profound as GPUs. However, existing software frameworks and training algorithms for deep learning…

The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in deep learning (DL). DL frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-11 Soojeong Kim , Gyeong-In Yu , Hojin Park , Sungwoo Cho , Eunji Jeong , Hyeonmin Ha , Sanha Lee , Joo Seong Jeong , Byung-Gon Chun

Neural networks have become dominant computational workloads across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement,…

Systems and Control · Electrical Eng. & Systems 2026-01-16 Bin Xu , Ayan Banerjee , Sandeep Gupta

The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the same package, which offers…

Machine Learning · Computer Science 2025-09-26 Xinyu Lian , Masahiro Tanaka , Olatunji Ruwase , Minjia Zhang

The rapid advancements in machine learning techniques have led to significant achievements in various real-world robotic tasks. These tasks heavily rely on fast and energy-efficient inference of deep neural network (DNN) models when…

Robotics · Computer Science 2024-05-30 Zekai Sun , Xiuxian Guan , Junming Wang , Haoze Song , Yuhao Qing , Tianxiang Shen , Dong Huang , Fangming Liu , Heming Cui

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-31 Faisal N. Abu-Khzam , Khuzaima Daudjee , Amer E. Mouawad , Naomi Nishimura

We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-29 Sourya Dey , Diandian Chen , Zongyang Li , Souvik Kundu , Kuan-Wen Huang , Keith M. Chugg , Peter A. Beerel

The rapid growth in machine learning models, especially in natural language processing and computer vision, has led to challenges when running these models on hardware with limited resources. This paper introduces Superpipeline, a new…

Machine Learning · Computer Science 2024-10-14 Reza Abbasi , Sernam Lim

Neuro-symbolic AI is gaining traction in domains such as large language models, scientific discovery, and autonomous systems due to its ability to combine perception with structured reasoning. However, its deployment is often constrained by…

Hardware Architecture · Computer Science 2026-04-20 Weilun Wang , Zirui Wang , Wantong Li

Recent advances in Neural Architecture Search (NAS) such as one-shot NAS offer the ability to extract specialized hardware-aware sub-network configurations from a task-specific super-network. While considerable effort has been employed…

Machine Learning · Computer Science 2022-05-24 Daniel Cummings , Anthony Sarah , Sharath Nittur Sridhar , Maciej Szankin , Juan Pablo Munoz , Sairam Sundaresan

As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-15 Kabir Nagrecha

Neural networks have proven to be extremely powerful tools for modern artificial intelligence applications, but computational and storage complexity remain limiting factors. This paper presents two compatible contributions towards reducing…

Machine Learning · Computer Science 2024-10-30 Sourya Dey , Kuan-Wen Huang , Peter A. Beerel , Keith M. Chugg

The hardware-software co-optimization of neural network architectures is becoming a major stream of research especially due to the emergence of commercial neuromorphic chips such as the IBM Truenorth and Intel Loihi. Development of specific…

Neural and Evolutionary Computing · Computer Science 2019-06-24 Roshan Gopalakrishnan , Yansong Chua , Ashish Jith Sreejith Kumar

The increasing scale and complexity of large language models (LLMs) pose significant inference latency challenges, primarily due to their autoregressive decoding paradigm characterized by the sequential nature of next-token prediction. By…

Computation and Language · Computer Science 2025-08-15 Keyu Chen , Zhifeng Shen , Daohai Yu , Haoqian Wu , Wei Wen , Jianfeng He , Ruizhi Qiao , Xing Sun

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu
‹ Prev 1 2 3 10 Next ›