Computer Science

elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search

Neural Architecture Search (NAS) has become an important approach for automatically designing neural networks under task-specific and hardware-specific constraints. However, many existing NAS frameworks tightly couple search space…

Hardware Architecture · Computer Science 2026-05-29 Natalie Maman , Florian Hettstedt , Andreas Erbslöh , Gregor Schiele

Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems

1D-CNNs play a crucial role for time-series analysis on tiny smart sensor systems, e.g. for biosignal analysis, predictive maintenance, or structural health monitoring. LUTbased precomputation has emerged as an interesting optimization…

Hardware Architecture · Computer Science 2026-05-29 Lukas Einhaus , Natalie Maman , Julian Hoever , Andreas Erbslöh , Gregor Schiele

Demystifying VEINS: A Reality Check Against Living Lab Experiments

Safety applications in vehicle-to-everything communications and Cooperative Intelligent Transport Systems rely on reliable and timely message exchange, which in turn depends on accurate modeling of wireless signal propagation. Simulation…

Performance · Computer Science 2026-05-29 Antonio Solida , Giovanni Gambigliani Zoccoli , Gaetano Orazio Cauchi , Filip Valgimigli , Salvatore Iandolo , Martin Klapez , Maurizio Casoni , Mirco Marchetti , Carlo Augusto Grazia

Design-Oriented Modeling of TSV Substrate Noise Coupling to Ring VCOs

Through-silicon vias (TSVs) enable dense vertical interconnects in 3D-IC and chiplet systems, but their metal-oxide-silicon structure introduces significant parasitic coupling paths that can degrade the spectral purity of sensitive RF…

Hardware Architecture · Computer Science 2026-05-29 Ilias Exouzidis , Alberto Garcia-Ortiz , George Floros , Georgios Panagopoulos

From Roofline to Ruggedness: Decomposing and Smoothing the GEMM Performance Landscape

Adjacent GEMM problems that differ by a single 128-element step in N can show 30% different throughput on the same GPU. This pervasive performance ruggedness - invisible to roofline analysis and peak-FLOPs intuition, yet dominant for every…

Performance · Computer Science 2026-05-29 Aditya Chatterjee

Constant Depth Threshold Circuits For Exhaustive Epistasis Detection

The development of large-scale neuromorphic hardware has made practical implementations of threshold gate-based circuits a near-term possibility. The complexity advantages regarding traditional computing classes, as evidenced in the…

Hardware Architecture · Computer Science 2026-05-29 André Ribeiro , Aleksandar Ilic , Leonel Sousa

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Large language models have achieved remarkable capabilities through scaling, and this paper does not challenge that. It instead investigates a different question: once large models already exist, can they become more accessible to…

Performance · Computer Science 2026-05-29 Myeong Jun Jo

Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory

Memory disaggregation via CXL enables multi-host resource sharing. However, existing CXL sharing mechanisms enforce coarse-grained, host-level permissions only, leaving isolation to the operating system. Today, virtual memory enables…

Hardware Architecture · Computer Science 2026-05-29 Kaustav Goswami , Sean Peisert , Venkatesh Akella , Jason Lowe-Power

Range, Not Precision: Block-Floating-Point Half-Precision FFT and SAR Imaging on Apple Silicon

Half precision (FP16) promises to double FFT throughput on GPUs, but the prevailing view is that its 10-bit mantissa makes it unsuitable for radar-grade signal processing. We show this framing is wrong on Apple Silicon: the binding…

Performance · Computer Science 2026-05-28 Mohamed Amine Bergach

Nonvolatile Charge-Domain Attention with HZO Ferroelectric Capacitors: A Simulation-Based Device-to-System Evaluation

Transformer decoding is constrained by both attention compute and KV-cache movement. This paper presents the Ferroelectric Charge-Domain Compute Cell (FCDC), a hafnium-zirconium-oxide (HZO) memcapacitor with an access device that stores…

Hardware Architecture · Computer Science 2026-05-28 Faris Abouagour

FT-Pilot: Automated Fault-Tolerant RTL Rewriting via Vulnerability-Guided LLMs

As integrated circuit technologies continue to scale toward advanced process nodes, the continual reduction in node capacitance and supply voltage has made digital systems increasingly vulnerable to soft errors. Although traditional…

Hardware Architecture · Computer Science 2026-05-28 Weixing Liu , Zizhen Liu , Jing Ye , Naixing Wang , Cheng Liu , Huawei Li , Xiaowei Li

CLIPGen: A Chiplet Link IP Modeling and Generation Framework for 2.5D Architecture Exploration

Advanced 2.5D Systems-in-Package (SiPs) compose a growing portion of high-performance systems. While the packaging and interconnect choices play a large role in the overall system design, system architects still lack a suitable framework…

Hardware Architecture · Computer Science 2026-05-28 Zhengping Zhu , Austin Rovinski

CXL-ClusterSim: Modeling CXL-based Disaggregated Memory Cluster for Pooling and Sharing using gem5 and SST

Large-scale AI training and inference require hundreds of gigabytes to terabytes of DRAM with high peak to average utilization ratios, resulting in overprovisioning. In cloud computing, DRAM constitutes a significant share of the cost. Yet,…

Hardware Architecture · Computer Science 2026-05-28 Kaustav Goswami , Maryam Babaie , Hoa Nguyen , Venkatesh Akella , Jason Lowe-Power

AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications

Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVerilog Assertions (SVAs) remains labor-intensive and error-prone. While Large Language Models (LLMs)…

Hardware Architecture · Computer Science 2026-05-28 Yuchao Wu , Wenji Fang , Jing Wang , Wenkai Li , Ziyan Guo , Zhiyao Xie

When NPUs Are Not Always Faster: A Stage-Level Analysis of Mobile LLM Inference

Deploying large language models (LLMs) on mobile devices increasingly relies on heterogeneous execution, yet no prior study has systematically characterized NPU effectiveness at the operator and pipeline level. We present the first…

Hardware Architecture · Computer Science 2026-05-28 Pu Li , Jiawen Qi , Qinyu Chen

A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks

Graph neural networks (GNNs) have gained significant interest for applications such as citation network analysis and drug discovery due to their ability to apply machine learning techniques on graph-structured data. GNNs typically employ a…

Hardware Architecture · Computer Science 2026-05-28 Siddhartha Raman Sundara Raman , Lizy John , Jaydeep P. Kulkarni

ROA-Based Subharmonic Injection Locking for Oscillator-Based Ising Machines

This paper introduces on-chip integrated rotary traveling wave oscillators (RTWOs) organized into rotary oscillator array (ROA) bricks as an external perturbation to induce subharmonic injection locking (SHIL) in oscillator-based Ising…

Hardware Architecture · Computer Science 2026-05-28 Nicholas Sica , Baris Taskin

A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture

Integer Linear Programming (ILP) is widely used for solving real-world optimization problems, including network routing, map routing, and traffic scheduling. However, ILP algorithms are sparse and branch-intensive, making them inefficient…

Hardware Architecture · Computer Science 2026-05-28 Siddhartha Raman Sundara Raman , Lizy K John , Jaydeep P. Kulkarni

Attributing the System's Overall Effect to its Components

In a computer system, multiple indispensable components-such as the CPU, memory, and others-work together with other essential components to produce an overall effect, which can only be measured on an independently running system. Since the…

Performance · Computer Science 2026-05-27 Chenxi Wang , Lei Wang , Wanling Gao , Fanda Fan , Guoxin Kang , Hongxiao Li , Yuchen Su , Jianfeng Zhan

Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding

Speculative decoding has emerged as a promising lossless approach for accelerating Large Language Models (LLMs). As reasoning LLMs increasingly suffer from decode-stage overhead and approximation-based methods degrade accuracy, lossless…

Hardware Architecture · Computer Science 2026-05-27 Soongyu Choi , Yuntae Kim , Muyoung Son , Joo-Young Kim