Related papers: Anchor-based Large Language Models

Anchor Attention, Small Cache: Code Generation with Large Language Models

The development of large language models (LLMs) has revolutionized automated code generation. However, their high demand of computation resources has hindered a broader deployment and raised environmental concerns. A common strategy for…

Software Engineering · Computer Science 2024-11-12 Xiangyu Zhang , Yu Zhou , Guang Yang , Harald C. Gall , Taolue Chen

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity

Large Language Models (LLMs) with extended context lengths face significant computational challenges during the pre-filling phase, primarily due to the quadratic complexity of self-attention. Existing methods typically employ dynamic…

Machine Learning · Computer Science 2025-05-30 Yu Zhang , Dong Guo , Fang Wu , Guoliang Zhu , Dian Ding , Yiming Zhang

Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs

While Large Language Models (LLMs) demonstrate impressive reasoning capabilities, growing evidence suggests much of their success stems from memorized answer-reasoning patterns rather than genuine inference. In this work, we investigate a…

Computation and Language · Computer Science 2025-06-24 Yang Wu , Yifan Zhang , Yiwei Wang , Yujun Cai , Yurong Wu , Yuran Wang , Ning Xu , Jian Cheng

AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism

Large Language Model (LLM) inference services demand exceptionally high availability and low latency, yet multi-GPU Tensor Parallelism (TP) makes them vulnerable to single-GPU failures. We present AnchorTP, a state-preserving elastic TP…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Wendong Xu , Chujie Chen , He Xiao , Kuan Li , Jing Xiong , Chen Zhang , Wenyong Zhou , Chaofan Tao , Yang Bai , Bei Yu , Ngai Wong

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices…

Computation and Language · Computer Science 2024-08-01 Keivan Alizadeh , Iman Mirzadeh , Dmitry Belenko , Karen Khatamifard , Minsik Cho , Carlo C Del Mundo , Mohammad Rastegari , Mehrdad Farajtabar

Rule Encoding and Compliance in Large Language Models: An Information-Theoretic Analysis

The design of safety-critical agents based on large language models (LLMs) requires more than simple prompt engineering. This paper presents a comprehensive information-theoretic analysis of how rule encodings in system prompts influence…

Artificial Intelligence · Computer Science 2025-10-10 Joachim Diederich

Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

Large Language Models (LLMs) have demonstrated impressive fluency and task competence in conversational settings. However, their effectiveness in multi-session and long-term interactions is hindered by limited memory persistence. Typical…

Computation and Language · Computer Science 2025-08-19 Maitreyi Chatterjee , Devansh Agarwal

Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei

From Anchors to Answers: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models

Enabling large language models (LLMs) to effectively process and reason with graph-structured data remains a significant challenge despite their remarkable success in natural language tasks. Current approaches either convert graph…

Artificial Intelligence · Computer Science 2025-09-03 Yanbiao Ji , Chang Liu , Xin Chen , Dan Luo , Mei Li , Yue Ding , Wenqing Lin , Hongtao Lu

Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding

While Large Language Models (LLMs) have shown remarkable abilities, they are hindered by significant resource consumption and considerable latency due to autoregressive processing. In this study, we introduce Adaptive N-gram Parallel…

Computation and Language · Computer Science 2024-07-11 Jie Ou , Yueming Chen , Wenhong Tian

Enhancing Large Language Models'Machine Translation via Dynamic Focus Anchoring

Large language models have demonstrated exceptional performance across multiple crosslingual NLP tasks, including machine translation (MT). However, persistent challenges remain in addressing context-sensitive units (CSUs), such as…

Computation and Language · Computer Science 2025-05-30 Qiuyu Ding , Zhiqiang Cao , Hailong Cao , Tiejun Zhao

AnchorMem: Anchored Facts with Associative Contexts for Building Memory in Large Language Models

While large language models have achieved remarkable performance in complex tasks, they still need a memory system to utilize historical experience in long-term interactions. Existing memory methods (e.g., A-Mem, Mem0) place excessive…

Computation and Language · Computer Science 2026-04-21 Zhanyu Shen , Sijie Cheng , Zhicheng Guo , Weiqin Wang , Yile Wang , Hui Huang

AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache

Large Language Models (LLMs) are widely used in generative applications such as chatting, code generation, and reasoning. However, many realworld workloads such as classification, question answering, recommendation, and text embedding rely…

Computation and Language · Computer Science 2025-11-13 Dinghong Song , Yuan Feng , Yiwei Wang , Shangye Chen , Cyril Guyot , Filip Blagojevic , Hyeran Jeon , Pengfei Su , Dong Li

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn from the provided…

Computation and Language · Computer Science 2023-12-20 Lean Wang , Lei Li , Damai Dai , Deli Chen , Hao Zhou , Fandong Meng , Jie Zhou , Xu Sun

Anchor function: a type of benchmark functions for studying language models

Understanding transformer-based language models is becoming increasingly crucial, particularly as they play pivotal roles in advancing towards artificial general intelligence. However, language model research faces significant challenges,…

Computation and Language · Computer Science 2024-01-17 Zhongwang Zhang , Zhiwei Wang , Junjie Yao , Zhangchen Zhou , Xiaolong Li , Weinan E , Zhi-Qin John Xu

MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices

Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs.…

Machine Learning · Computer Science 2025-06-13 Zhaode Wang , Jingbang Yang , Xinyu Qian , Shiwen Xing , Xiaotang Jiang , Chengfei Lv , Shengyu Zhang

Anchored Diffusion Language Model

Diffusion Language Models (DLMs) promise parallel generation and bidirectional context, yet they underperform autoregressive (AR) models in both likelihood modeling and generated text quality. We identify that this performance gap arises…

Computation and Language · Computer Science 2025-05-27 Litu Rout , Constantine Caramanis , Sanjay Shakkottai

SCM: Enhancing Large Language Model with Self-Controlled Memory Framework

Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information. To address this limitation, in this paper, we propose the Self-Controlled Memory (SCM)…

Computation and Language · Computer Science 2025-03-19 Bing Wang , Xinnian Liang , Jian Yang , Hui Huang , Shuangzhi Wu , Peihao Wu , Lu Lu , Zejun Ma , Zhoujun Li

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process…

Computation and Language · Computer Science 2024-05-29 Chaojun Xiao , Pengle Zhang , Xu Han , Guangxuan Xiao , Yankai Lin , Zhengyan Zhang , Zhiyuan Liu , Maosong Sun

Challenges and Research Directions for Large Language Model Inference Hardware

Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory…

Hardware Architecture · Computer Science 2026-02-10 Xiaoyu Ma , David Patterson