Related papers: PASTA: A Modular Program Analysis Tool Framework f…

PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation

AI compliance is becoming increasingly critical as AI systems grow more powerful and pervasive. Yet the rapid expansion of AI policies creates substantial burdens for resource-constrained practitioners lacking policy expertise. Existing…

Human-Computer Interaction · Computer Science 2026-03-26 Yu Yang , Ig-Jae Kim , Dongwook Yoon

PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite

Tensor methods have gained increasingly attention from various applications, including machine learning, quantum chemistry, healthcare analytics, social network analysis, data mining, and signal processing, to name a few. Sparse tensors and…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-12 Jiajia Li , Yuchen Ma , Xiaolong Wu , Ang Li , Kevin Barker

PASTA: Towards Flexible and Efficient HDR Imaging Via Progressively Aggregated Spatio-Temporal Alignment

Leveraging Transformer attention has led to great advancements in HDR deghosting. However, the intricate nature of self-attention introduces practical challenges, as existing state-of-the-art methods often demand high-end GPUs or exhibit…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Xiaoning Liu , Ao Li , Zongwei Wu , Yapeng Du , Le Zhang , Yulun Zhang , Radu Timofte , Ce Zhu

PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers

The increased demand for tools that automate the 3D content creation process led to tremendous progress in deep generative models that can generate diverse 3D objects of high fidelity. In this paper, we present PASTA, an autoregressive…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Songlin Li , Despoina Paschalidou , Leonidas Guibas

Mira: A Framework for Static Performance Analysis

The performance model of an application can pro- vide understanding about its runtime behavior on particular hardware. Such information can be analyzed by developers for performance tuning. However, model building and analyzing is…

Performance · Computer Science 2017-05-23 Kewen Meng , Boyana Norris

TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design

In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of…

Hardware Architecture · Computer Science 2024-10-18 Licheng Guo , Yuze Chi , Jason Lau , Linghao Song , Xingyu Tian , Moazin Khatti , Weikang Qiao , Jie Wang , Ecenur Ustun , Zhenman Fang , Zhiru Zhang , Jason Cong

Next Generation Computational Tools for the Modeling and Design of Particle Accelerators at Exascale

Particle accelerators are among the largest, most complex devices. To meet the challenges of increasing energy, intensity, accuracy, compactness, complexity and efficiency, increasingly sophisticated computational tools are required for…

Accelerator Physics · Physics 2023-01-13 Axel Huebl , Remi Lehe , Chad E. Mitchell , Ji Qiang , Robert D. Ryne , Ryan T. Sandberg , Jean-Luc Vay

Parameter-Efficient Tuning with Special Token Adaptation

Parameter-efficient tuning aims at updating only a small subset of parameters when adapting a pretrained model to downstream tasks. In this work, we introduce PASTA, in which we only modify the special token representations (e.g., [SEP] and…

Computation and Language · Computer Science 2023-02-15 Xiaocong Yang , James Y. Huang , Wenxuan Zhou , Muhao Chen

GPA: A GPU Performance Advisor Based on Instruction Sampling

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we…

Performance · Computer Science 2020-11-25 Keren Zhou , Xiaozhu Meng , Ryuichi Sai , John Mellor-Crummey

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due to immense parameters and operations of models. To relieve this burden,…

Hardware Architecture · Computer Science 2022-11-01 Chao Fang , Aojun Zhou , Zhongfeng Wang

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

Need for the efficient processing of neural networks has given rise to the development of hardware accelerators. The increased adoption of specialized hardware has highlighted the need for more agile design flows for hardware-software…

Machine Learning · Computer Science 2023-05-22 Shvetank Prakash , Tim Callahan , Joseph Bushagour , Colby Banbury , Alan V. Green , Pete Warden , Tim Ansell , Vijay Janapa Reddi

Dato: A Task-Based Programming Model for Dataflow Accelerators

Recent deep learning workloads increasingly push computational demand beyond what current memory systems can sustain, with many kernels stalling on data movement rather than computation. While modern dataflow accelerators incorporate…

Programming Languages · Computer Science 2025-09-09 Shihan Fang , Hongzheng Chen , Niansong Zhang , Jiajie Li , Han Meng , Adrian Liu , Zhiru Zhang

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials

We present exa-AMD, an open-source, high-performance framework designed for accelerated materials discovery on modern supercomputers. exa-AMD overcomes key computational bottlenecks in large-scale structure prediction through task-based…

Materials Science · Physics 2025-12-11 Weiyi Xia , Maxim Moraru , Ying Wai Li , Cai-Zhuang Wang

A Synthetic Data-Driven Radiology Foundation Model for Pan-tumor Clinical Diagnosis

AI-assisted imaging made substantial advances in tumor diagnosis and management. However, a major barrier to developing robust oncology foundation models is the scarcity of large-scale, high-quality annotated datasets, which are limited by…

Image and Video Processing · Electrical Eng. & Systems 2026-02-16 Wenhui Lei , Hanyu Chen , Zitian Zhang , Luyang Luo , Qiong Xiao , Yannian Gu , Peng Gao , Yankai Jiang , Ci Wang , Guangtao Wu , Tongjia Xu , Yingjie Zhang , Pranav Rajpurkar , Xiaofan Zhang , Shaoting Zhang , Zhenning Wang

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting…

Computation and Language · Computer Science 2024-10-02 Qingru Zhang , Chandan Singh , Liyuan Liu , Xiaodong Liu , Bin Yu , Jianfeng Gao , Tuo Zhao

TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture

Modern transformer-based deep neural networks present unique technical challenges for effective acceleration in real-world applications. Apart from the vast amount of linear operations needed due to their sizes, modern transformer models…

Hardware Architecture · Computer Science 2024-11-07 Jiajun Wu , Mo Song , Jingmin Zhao , Yizhao Gao , Jia Li , Hayden Kwok-Hay So

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware…

Machine Learning · Computer Science 2022-02-02 Dan Zhang , Safeen Huda , Ebrahim Songhori , Kartik Prabhu , Quoc Le , Anna Goldie , Azalia Mirhoseini

MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

Multi-accelerator servers are increasingly being deployed in shared multi-tenant environments (such as in cloud data centers) in order to meet the demands of large-scale compute-intensive workloads. In addition, these accelerators are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-08 Kiran Ranganath , Joshua D. Suetterlein , Joseph B. Manzano , Shuaiwen Leon Song , Daniel Wong

PASTA: Vision Transformer Patch Aggregation for Weakly Supervised Target and Anomaly Segmentation

Detecting unseen anomalies in unstructured environments presents a critical challenge for industrial and agricultural applications such as material recycling and weeding. Existing perception systems frequently fail to satisfy the strict…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Melanie Neubauer , Elmar Rueckert , Christian Rauch

Versatile Cross-platform Compilation Toolchain for Schr\"odinger-style Quantum Circuit Simulation

While existing quantum hardware resources have limited availability and reliability, there is a growing demand for exploring and verifying quantum algorithms. Efficient classical simulators for high-performance quantum simulation are…

Quantum Physics · Physics 2025-03-26 Yuncheng Lu , Shuang Liang , Hongxiang Fan , Ce Guo , Wayne Luk , Paul H. J. Kelly