Related papers: Unification-based Pointer Analysis without Oversha…

Enhancing Semantic Understanding in Pointer Analysis using Large Language Models

Pointer analysis has been studied for over four decades. However, existing frameworks continue to suffer from the propagation of incorrect facts. A major limitation stems from their insufficient semantic understanding of code, resulting in…

Software Engineering · Computer Science 2025-09-01 Baijun Cheng , Kailong Wang , Ling Shi , Haoyu Wang , Yao Guo , Ding Li , Xiangqun Chen

Boosting Pointer Analysis With LLM-Enhanced Allocation Function Detection

Pointer analysis is foundational for many static analysis tasks, yet its effectiveness is often hindered by imprecise modeling of heap allocations, particularly in C/C++ programs where custom allocation functions (CAFs) are pervasive.…

Software Engineering · Computer Science 2025-12-01 Baijun Cheng , Kailong Wang , Ling Shi , Haoyu Wang , Peng Di , Ding Li , Xiangqun Chen , Yao Guo

Correlating Effectiveness of Pointer Analysis Techniques with Patterns in Embedded System Code

A pointer analysis maps the pointers in a program to the memory locations they point to. In this work, we study the effectiveness of the three flavors of pointer analysis namely flow sensitive, flow insensitive, and context sensitive…

Software Engineering · Computer Science 2022-08-12 Komal Pathade

HashAttention: Semantic Sparsity for Faster Inference

Leveraging long contexts is crucial for advanced AI systems, but attention computation poses a scalability challenge. While scaled dot-product attention (SDPA) exhibits token sparsity, i.e. only a few pivotal tokens significantly contribute…

Machine Learning · Computer Science 2025-06-05 Aditya Desai , Shuo Yang , Alejandro Cuadron , Matei Zaharia , Joseph E. Gonzalez , Ion Stoica

Demand-Driven Pointer Analysis with Strong Updates via Value-Flow Refinement

We present a new demand-driven flow- and context-sensitive pointer analysis with strong updates for C programs, called SUPA, that enables computing points-to information via value-flow refinement, in environments with small time and memory…

Programming Languages · Computer Science 2017-01-23 Yulei Sui , Jingling Xue

SEMA: a Scalable and Efficient Mamba like Attention via Token Localization and Averaging

Attention is the critical component of a transformer. Yet the quadratic computational complexity of vanilla full attention in the input size and the inability of its linear attention variant to focus have been challenges for computer vision…

Computer Vision and Pattern Recognition · Computer Science 2025-06-11 Nhat Thanh Tran , Fanghui Xue , Shuai Zhang , Jiancheng Lyu , Yunling Zheng , Yingyong Qi , Jack Xin

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

The Transformer architecture has significantly advanced natural language processing (NLP) and has been foundational in developing large language models (LLMs) such as LLaMA and OPT, which have come to dominate a broad range of NLP tasks.…

Artificial Intelligence · Computer Science 2024-03-27 Youpeng Zhao , Di Wu , Jun Wang

Efficient, VRAM-Constrained xLM Inference on Clients

To usher in the next round of client AI innovation, there is an urgent need to enable efficient, lossless inference of high-accuracy large language models (LLMs) and vision language models (VLMs), jointly referred to as xLMs, on client…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-30 Aditya Ukarande , Deep Shekhar , Marc Blackstein , Ram Rangan

Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks

Large language models (LLMs) have demonstrated strong performance on formal language tasks, yet whether this reflects genuine symbolic reasoning or pattern matching on familiar constructions remains unclear. We introduce a benchmark for…

Computation and Language · Computer Science 2026-01-21 Shlok Shelat , Jay Raval , Souvik Roy , Manas Gaur

Visualizing token importance for black-box language models

We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings, particularly in high-stakes domains such as legal, medical, and regulatory compliance. Existing…

Computation and Language · Computer Science 2025-12-15 Paulius Rauba , Qiyao Wei , Mihaela van der Schaar

Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving

Processing long contexts has become a critical capability for modern large language models (LLMs). However, serving long-context LLMs comes with significant inference costs due to the high memory overhead of the key-value (KV) cache.…

Machine Learning · Computer Science 2025-03-04 Qihui Zhou , Peiqi Yin , Pengfei Zuo , James Cheng

Block Sparse Flash Attention

Modern large language models increasingly require long contexts for reasoning and multi-document tasks, but attention's quadratic complexity creates a severe computational bottleneck. We present Block-Sparse FlashAttention (BSFA), a drop-in…

Machine Learning · Computer Science 2025-12-09 Daniel Ohayon , Itay Lamprecht , Itay Hubara , Israel Cohen , Daniel Soudry , Noam Elata

Scaling Attention via Feature Sparsity

Scaling Transformers to ultra-long contexts is bottlenecked by the $O(n^2 d)$ cost of self-attention. Existing methods reduce this cost along the sequence axis through local windows, kernel approximations, or token-level sparsity, but these…

Machine Learning · Computer Science 2026-03-31 Yan Xie , Tiansheng Wen , Tangda Huang , Bo Chen , Chenyu You , Stefanie Jegelka , Yifei Wang

Selective Attention: Enhancing Transformer through Principled Context Control

The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries $q$ in the same…

Machine Learning · Computer Science 2024-11-21 Xuechen Zhang , Xiangyu Chang , Mingchen Li , Amit Roy-Chowdhury , Jiasi Chen , Samet Oymak

DFI: An Interprocedural Value-Flow Analysis Framework that Scales to Large Codebases

Context- and flow-sensitive value-flow information is an important building block for many static analysis tools. Unfortunately, current approaches to compute value-flows do not scale to large codebases, due to high memory and runtime…

Programming Languages · Computer Science 2022-09-07 Min-Yih Hsu , Felicitas Hetzelt , Michael Franz

Precise Null Pointer Analysis Through Global Value Numbering

Precise analysis of pointer information plays an important role in many static analysis techniques and tools today. The precision, however, must be balanced against the scalability of the analysis. This paper focusses on improving the…

Programming Languages · Computer Science 2017-02-21 Ankush Das , Akash Lal

Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis

Attention calculation is extremely time-consuming for long-sequence inference tasks, such as text or image/video generation, in large models. To accelerate this process, we developed a low-precision, mathematically-equivalent algorithm…

Machine Learning · Computer Science 2025-03-05 Long Cheng , Qichen Liao , Fan Wu , Junlin Mu , Tengfei Han , Zhe Qiu , Lianqiang Li , Tianyi Liu , Fangzheng Miao , Keming Gao , Liang Wang , Zhen Zhang , Qiande Yin

Improving bit-vector representation of points-to sets using class hierarchy

Points-to analysis is the problem of approximating run-time values of pointers statically or at compile-time. Points-to sets are used to store the approximated values of pointers during points-to analysis. Memory usage and running time…

Programming Languages · Computer Science 2015-03-19 Hamid A. Toussi , Ahmed Khademzadeh

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity…

Computation and Language · Computer Science 2025-09-04 Qianchao Zhu , Jiangfei Duan , Chang Chen , Siran Liu , Guanyu Feng , Xin Lv , Xiao Chuanfu , Dahua Lin , Chao Yang

Points-to Analysis Using MDE: A Multi-level Deduplication Engine for Repetitive Data and Operations

Precise pointer analysis is a foundational component of many client analyses and optimizations. Scaling flow- and context-sensitive pointer analysis has been a long-standing challenge, suffering from combinatorial growth in both memory…

Programming Languages · Computer Science 2026-04-14 Anamitra Ghorui , Aditi Raste , Uday P. Khedker