Related papers: Unification-based Pointer Analysis without Oversha…
Pointer analysis has been studied for over four decades. However, existing frameworks continue to suffer from the propagation of incorrect facts. A major limitation stems from their insufficient semantic understanding of code, resulting in…
Pointer analysis is foundational for many static analysis tasks, yet its effectiveness is often hindered by imprecise modeling of heap allocations, particularly in C/C++ programs where custom allocation functions (CAFs) are pervasive.…
A pointer analysis maps the pointers in a program to the memory locations they point to. In this work, we study the effectiveness of the three flavors of pointer analysis namely flow sensitive, flow insensitive, and context sensitive…
Leveraging long contexts is crucial for advanced AI systems, but attention computation poses a scalability challenge. While scaled dot-product attention (SDPA) exhibits token sparsity, i.e. only a few pivotal tokens significantly contribute…
We present a new demand-driven flow- and context-sensitive pointer analysis with strong updates for C programs, called SUPA, that enables computing points-to information via value-flow refinement, in environments with small time and memory…
Attention is the critical component of a transformer. Yet the quadratic computational complexity of vanilla full attention in the input size and the inability of its linear attention variant to focus have been challenges for computer vision…
The Transformer architecture has significantly advanced natural language processing (NLP) and has been foundational in developing large language models (LLMs) such as LLaMA and OPT, which have come to dominate a broad range of NLP tasks.…
To usher in the next round of client AI innovation, there is an urgent need to enable efficient, lossless inference of high-accuracy large language models (LLMs) and vision language models (VLMs), jointly referred to as xLMs, on client…
Large language models (LLMs) have demonstrated strong performance on formal language tasks, yet whether this reflects genuine symbolic reasoning or pattern matching on familiar constructions remains unclear. We introduce a benchmark for…
We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings, particularly in high-stakes domains such as legal, medical, and regulatory compliance. Existing…
Processing long contexts has become a critical capability for modern large language models (LLMs). However, serving long-context LLMs comes with significant inference costs due to the high memory overhead of the key-value (KV) cache.…
Modern large language models increasingly require long contexts for reasoning and multi-document tasks, but attention's quadratic complexity creates a severe computational bottleneck. We present Block-Sparse FlashAttention (BSFA), a drop-in…
Scaling Transformers to ultra-long contexts is bottlenecked by the $O(n^2 d)$ cost of self-attention. Existing methods reduce this cost along the sequence axis through local windows, kernel approximations, or token-level sparsity, but these…
The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries $q$ in the same…
Context- and flow-sensitive value-flow information is an important building block for many static analysis tools. Unfortunately, current approaches to compute value-flows do not scale to large codebases, due to high memory and runtime…
Precise analysis of pointer information plays an important role in many static analysis techniques and tools today. The precision, however, must be balanced against the scalability of the analysis. This paper focusses on improving the…
Attention calculation is extremely time-consuming for long-sequence inference tasks, such as text or image/video generation, in large models. To accelerate this process, we developed a low-precision, mathematically-equivalent algorithm…
Points-to analysis is the problem of approximating run-time values of pointers statically or at compile-time. Points-to sets are used to store the approximated values of pointers during points-to analysis. Memory usage and running time…
Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity…
Precise pointer analysis is a foundational component of many client analyses and optimizations. Scaling flow- and context-sensitive pointer analysis has been a long-standing challenge, suffering from combinatorial growth in both memory…