Related papers: Hardware Acceleration of Explainable Machine Learn…

Hardware Acceleration of Explainable Artificial Intelligence

Machine learning (ML) is successful in achieving human-level artificial intelligence in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While recent efforts on explainable AI (XAI) has…

Machine Learning · Computer Science 2023-05-09 Zhixin Pan , Prabhat Mishra

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

ML For Hardware Design Interpretability: Challenges and Opportunities

The increasing size and complexity of machine learning (ML) models have driven the growing need for custom hardware accelerators capable of efficiently supporting ML workloads. However, the design of such accelerators remains a…

Machine Learning · Computer Science 2025-04-15 Raymond Baartmans , Andrew Ensinger , Victor Agostinelli , Lizhong Chen

Towards Explainable Artificial Intelligence

In recent years, machine learning (ML) has become a key enabling technology for the sciences and industry. Especially through improvements in methodology, the availability of large databases and increased computational power, today's ML…

Artificial Intelligence · Computer Science 2019-09-27 Wojciech Samek , Klaus-Robert Müller

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

Pushing Tensor Accelerators Beyond MatMul in a User-Schedulable Language

Tensor accelerators now represent a growing share of compute resources in modern CPUs and GPUs. However, they are hard to program, leading developers to use vendor-provided kernel libraries that support tensor accelerators. As a result, the…

Programming Languages · Computer Science 2026-02-12 Yihong Zhang , Derek Gerstmann , Andrew Adams , Maaz Bin Safeer Ahmad

A Learned Performance Model for Tensor Processing Units

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

Performance · Computer Science 2021-03-19 Samuel J. Kaufman , Phitchaya Mangpo Phothilimthana , Yanqi Zhou , Charith Mendis , Sudip Roy , Amit Sabne , Mike Burrows

Investigating the Duality of Interpretability and Explainability in Machine Learning

The rapid evolution of machine learning (ML) has led to the widespread adoption of complex "black box" models, such as deep neural networks and ensemble methods. These models exhibit exceptional predictive performance, making them…

Machine Learning · Computer Science 2025-03-28 Moncef Garouani , Josiane Mothe , Ayah Barhrhouj , Julien Aligon

Tensor networks for interpretable and efficient quantum-inspired machine learning

It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum…

Quantum Physics · Physics 2023-11-21 Shi-Ju Ran , Gang Su

Counterfactual Explanations for Machine Learning on Multivariate Time Series Data

Applying machine learning (ML) on multivariate time series data has growing popularity in many application domains, including in computer system management. For example, recent high performance computing (HPC) research proposes a variety of…

Machine Learning · Computer Science 2021-08-20 Emre Ates , Burak Aksar , Vitus J. Leung , Ayse K. Coskun

Exploration of TPUs for AI Applications

Tensor Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI. We provide an overview of TPUs,…

Hardware Architecture · Computer Science 2023-11-15 Diego Sanmartín Carrión , Vera Prohaska

Accelerating MRI Reconstruction on TPUs

The advanced magnetic resonance (MR) image reconstructions such as the compressed sensing and subspace-based imaging are considered as large-scale, iterative, optimization problems. Given the large number of reconstructions required by the…

Computational Engineering, Finance, and Science · Computer Science 2020-06-26 Tianjian Lu , Thibault Marin , Yue Zhuo , Yi-Fan Chen , Chao Ma

Opportunities in Machine Learning for Particle Accelerators

Machine learning (ML) is a subfield of artificial intelligence. The term applies broadly to a collection of computational algorithms and techniques that train systems from raw data rather than a priori models. ML techniques are now…

Accelerator Physics · Physics 2018-11-09 Auralee Edelen , Christopher Mayes , Daniel Bowring , Daniel Ratner , Andreas Adelmann , Rasmus Ischebeck , Jochem Snuverink , Ilya Agapov , Raimund Kammering , Jonathan Edelen , Ivan Bazarov , Gianluca Valentino , Jorg Wenninger

TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator

The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless,…

Hardware Architecture · Computer Science 2025-03-11 Deepak Vungarala , Mohammed E. Elbtity , Sumiya Syed , Sakila Alam , Kartik Pandit , Arnob Ghosh , Ramtin Zand , Shaahin Angizi

Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis

Efficient and interpretable spatial analysis is crucial in many fields such as geology, sports, and climate science. Tensor latent factor models can describe higher-order correlations for spatial data. However, they are computationally…

Machine Learning · Computer Science 2020-08-18 Jung Yeon Park , Kenneth Theo Carr , Stephan Zheng , Yisong Yue , Rose Yu

Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon Photonics

Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPUs for energy-efficient ML processing. However, the evolution of electronic accelerators is facing fundamental limits…

Hardware Architecture · Computer Science 2023-01-31 Febin Sunny , Ebadollah Taheri , Mahdi Nikdast , Sudeep Pasricha

Large Scale Distributed Linear Algebra With Tensor Processing Units

We have repurposed Google Tensor Processing Units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast inter-core interconnects (ICI)s, physically…

Computational Physics · Physics 2022-09-14 Adam G. M. Lewis , Jackson Beall , Martin Ganahl , Markus Hauru , Shrestha Basu Mallick , Guifre Vidal

Enhancing Manufacturing Quality Prediction Models through the Integration of Explainability Methods

This research presents a method that utilizes explainability techniques to amplify the performance of machine learning (ML) models in forecasting the quality of milling processes, as demonstrated in this paper through a manufacturing use…

Artificial Intelligence · Computer Science 2024-03-28 Dennis Gross , Helge Spieker , Arnaud Gotlieb , Ricardo Knoblauch

Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu

From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs

With the widespread adoption of Large Language Models (LLMs), the demand for high-performance LLM inference services continues to grow. To meet this demand, a growing number of AI accelerators have been proposed, such as Google TPU, Huawei…

Hardware Architecture · Computer Science 2025-10-08 Tianhao Zhu , Dahu Feng , Erhu Feng , Yubin Xia